Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks joins Dave Vellante for theCUBE on Cloud 2021.
#theCUBE #CUBEOnCloud #WomenInTech
https://siliconangle.com/2021/01/21/the-new-data-paradigm-embrace-complexity-cubeoncloud/
The new data paradigm: Embrace decentralization
SPECIAL COVERAGE: THECUBE ON CLOUD BY MARK ALBERTSON
There are times when a centralized data source may not always be a good idea.
In a paper published in May 2019, Zhamak Dehghani (pictured), director of Next Tech Incubation NA at ThoughtWorks Inc., uncovered the trouble with data lakes. Despite hefty investments by major enterprises, such centralized reservoirs have created failure modes in deriving business value. Instead, Dehghani called for a paradigm shift that would draw on basic tenets of modern distributed architecture. For her, the time has come to start treating data as a product.
“Let’s decouple this world of analytical data to mirror the same way we have decoupled our systems and teams and business,” Dehghani said. “Why should data be any different? Let’s bring product thinking and treating data as a product to the data that teams now share.”
Dehghani spoke with Dave Vellante, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during theCUBE on Cloud event. They discussed the key elements of a data mesh, recognizing the value of data to the business, building models around complexity, and the evolving role of governance in the enterprise.
Embracing decentralization
Dehghani is an advocate for rethinking how enterprises create and manage data architectures. Her approach favors decentralized over monolithic structures and elevating domain knowledge as the primary criterion for organizing big data teams and platforms.
Moving beyond monolithic, centralized data lakes and “all-in-one” data warehouses, to embrace the distributed nature of information will require a data mesh architecture, according to Dehghani. This philosophy is grounded in the reality that the industry as a whole has moved dramatically away from a centralized model.
“If you look at the parallel movement of our industry in general, since the birth of internet, we are actually moving towards decentralization,” Dehghani noted. “If we said the only way we can get access to various applications on the web is to centralize them, we would laugh at that idea. But for some reason we don’t question that when it comes to data.”
In addition to being a bottleneck, centralized data structures miss a critical element behind all of the work behind data management in the first place: It’s of no use unless people can fully recognize data’s value. This often ends up being a primary reason why machine learning models ultimately fail.
“We end up training machine learning models on data that is not really representative of the reality of the business, and then we put them into production and they don’t work,” Dehghani said. “It’s managed by a team of highly specialized people who are struggling to understand the actual value of the data. It’s not going to get us to where our aspirations or ambitions need to be.”
Self-serve infrastructure
The solution, according to Dehghani, is to consider data domains as first-class citizens and apply platform thinking to create a self-serve data infrastructure. As data volume has exploded, so have the multiple sources where information is being generated.
This will require accepting the resulting complexity and building a platform to accommodate that.
“It’s time to embrace the complexity that comes with the growth of a number of sources,” Dehghani said. “The architecture, technology and organization structure incentives need to move to embrace that complexity. That requires a paradigm shift in full stack.”
The success of a distributed data platform relies on domain data teams to apply product thinking to the datasets provided within an organization. Unique data assets are the product; data scientists and engineers are the customers.
Much as data has been treated for decades in a centralized manner, so has governance. But one governance model does not necessarily meet all needs, and Dehghani envisions an adaptable framework.
“The governance model in the old world has been very command and control, very centralized,” Dehghani said. “In the world of a data mesh, the job of data governance as a function becomes finding equilibrium between what decisions need to be made and enforced globally and what decisions need to be made locally.”
What about the application of key machine learning tools, such as TensorFlow or PyTorch?
“I truly believe we need to reimagine that world,” Dehghani said. “Go make it happen ‘platform,’ go provision everything I need so as a data product developer all I can focus on is the data itself. We have a lot of work to do.”
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
theCUBE on Cloud 2021 | Digital. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For theCUBE on Cloud 2021 | Digital
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for theCUBE on Cloud 2021 | Digital.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
theCUBE on Cloud 2021 | Digital. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to theCUBE on Cloud 2021 | Digital
Please sign in with LinkedIn to continue to theCUBE on Cloud 2021 | Digital. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Zhamak Dehghani, ThoughtWorks | theCUBE on Cloud 2021
Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks joins Dave Vellante for theCUBE on Cloud 2021.
#theCUBE #CUBEOnCloud #WomenInTech
https://siliconangle.com/2021/01/21/the-new-data-paradigm-embrace-complexity-cubeoncloud/
The new data paradigm: Embrace decentralization
SPECIAL COVERAGE: THECUBE ON CLOUD BY MARK ALBERTSON
There are times when a centralized data source may not always be a good idea.
In a paper published in May 2019, Zhamak Dehghani (pictured), director of Next Tech Incubation NA at ThoughtWorks Inc., uncovered the trouble with data lakes. Despite hefty investments by major enterprises, such centralized reservoirs have created failure modes in deriving business value. Instead, Dehghani called for a paradigm shift that would draw on basic tenets of modern distributed architecture. For her, the time has come to start treating data as a product.
“Let’s decouple this world of analytical data to mirror the same way we have decoupled our systems and teams and business,” Dehghani said. “Why should data be any different? Let’s bring product thinking and treating data as a product to the data that teams now share.”
Dehghani spoke with Dave Vellante, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during theCUBE on Cloud event. They discussed the key elements of a data mesh, recognizing the value of data to the business, building models around complexity, and the evolving role of governance in the enterprise.
Embracing decentralization
Dehghani is an advocate for rethinking how enterprises create and manage data architectures. Her approach favors decentralized over monolithic structures and elevating domain knowledge as the primary criterion for organizing big data teams and platforms.
Moving beyond monolithic, centralized data lakes and “all-in-one” data warehouses, to embrace the distributed nature of information will require a data mesh architecture, according to Dehghani. This philosophy is grounded in the reality that the industry as a whole has moved dramatically away from a centralized model.
“If you look at the parallel movement of our industry in general, since the birth of internet, we are actually moving towards decentralization,” Dehghani noted. “If we said the only way we can get access to various applications on the web is to centralize them, we would laugh at that idea. But for some reason we don’t question that when it comes to data.”
In addition to being a bottleneck, centralized data structures miss a critical element behind all of the work behind data management in the first place: It’s of no use unless people can fully recognize data’s value. This often ends up being a primary reason why machine learning models ultimately fail.
“We end up training machine learning models on data that is not really representative of the reality of the business, and then we put them into production and they don’t work,” Dehghani said. “It’s managed by a team of highly specialized people who are struggling to understand the actual value of the data. It’s not going to get us to where our aspirations or ambitions need to be.”
Self-serve infrastructure
The solution, according to Dehghani, is to consider data domains as first-class citizens and apply platform thinking to create a self-serve data infrastructure. As data volume has exploded, so have the multiple sources where information is being generated.
This will require accepting the resulting complexity and building a platform to accommodate that.
“It’s time to embrace the complexity that comes with the growth of a number of sources,” Dehghani said. “The architecture, technology and organization structure incentives need to move to embrace that complexity. That requires a paradigm shift in full stack.”
The success of a distributed data platform relies on domain data teams to apply product thinking to the datasets provided within an organization. Unique data assets are the product; data scientists and engineers are the customers.
Much as data has been treated for decades in a centralized manner, so has governance. But one governance model does not necessarily meet all needs, and Dehghani envisions an adaptable framework.
“The governance model in the old world has been very command and control, very centralized,” Dehghani said. “In the world of a data mesh, the job of data governance as a function becomes finding equilibrium between what decisions need to be made and enforced globally and what decisions need to be made locally.”
What about the application of key machine learning tools, such as TensorFlow or PyTorch?
“I truly believe we need to reimagine that world,” Dehghani said. “Go make it happen ‘platform,’ go provision everything I need so as a data product developer all I can focus on is the data itself. We have a lot of work to do.”