29 March 2022
Data management within businesses is in constant evolution towards more competitive and reliable solutions. Since the 1980’s, companies have been developing and using centralized solutions for management and business intelligence. However, in the last few years, and especially in IT companies, decentralized approaches have been favored with microservices architectures in software for example. Centralized solutions to data management can have some issues when dealing with sensitive information such as health data. This decentralized approach has been resolved by a new paradigm: the data mesh
What is a Data Mesh?
In order to understand what a data mesh is, it is important that we understand the other solutions of data management.
A data lake is a type of centralized solution for data storage that can host structured data as well as semi-structured and unstructured data.
They often have a big storage capacity and are aimed at storing large quantities for purposes such as data mining and machine learning. Making changes in a data lake is very easy accessible. However, it is more difficult to understand and navigate as the data is not structured, and they are more often used by specialists.
The main difference between a data warehouse and a data lake is that the former is meant to store structured and filtered information.
Contrary to their counterparts, data warehouses require a lower storage capacity and are easier to use by a more diverse scope of users. However, because of their structure making changing and accessing data is not simple, and it can be difficult (and expensive) to manipulate.
The data mesh model, which was conceptualized by Zhamak Dehghani, aims to go from a monolithic centralized system to a decentralized domain-driven model. This particular approach is based on the microservices' architecture. As a similar architecture is already implemented in software architecture, the data mesh is a continuity of this paradigm.
What are the benefits of a Data Mesh ?
It is nice to imagine a new solution for storing data in a decentralized way and with a domain-driven structure, but what can companies benefit from implementing such a solution?
The decentralized aspect of a data mesh allows teams to access it independently and also allows for better scalability. Applying the product mindset to this solution allows for new levels of agility for data management within companies.
Better data governance
The fact that the data is stored according to a domain, allows for easy governance for defining user roles and their permissions. In order to have an effective governance, a new principle is introduced that is specific to the data mesh: federated data governance. This means that there are central governance standards and each domain is free to apply those standards in the way that is the most appropriate to their activity.
This new concept for data governance guarantees secure storage and compliance with security policies.
The infrastructure of a data mesh is based on a “self-serve” model. Indeed, there is a very low complexity on how the data is accessed, making for a fast and reliable way to access data. This architecture reduces the amount of processing and intervention layers, which allows businesses to make SQL queries with very low latency.
Implementing a data mesh using cloud technologies
Considering that the principle of a data mesh is to have a very decentralized architecture for storing data, it would be an effective solution to implement it using cloud technologies. Some solutions have already been imagined. Let’s have a look at how Google Cloud Platform, for example, has implemented a data mesh.
Google Cloud Platform solution
Thanks to a new tool for data management called Dataplex, Google Cloud Platform has been able to provide a tool for easily dividing data into independent data domains. This tool provides the solution for an easy implementation of a data mesh.
The implementation of a data mesh using this solution requires several steps.
Having a storage solution
We need a storage solution to have the assets for our data mesh. The available solutions are a Cloud Storage Bucket or a Big Query database. Once our storage is set up, we can create a new Dataplex lake that will act as our data mesh.
There are two types of zones:
- Raw zones: they are used to storing data from different sources, and that might need processing.
- Curated zones: they are used for data that is already structured and organized from a cloud storage source.
Each domain of your data mesh should have at least one raw and one curated zone in order to have an inclusive storage for each domain. Then, more zones can be created such as zones to manage contracts between teams, for example.
Attach assets to zones
Finally, you can attach your data to your zone by adding your Cloud Storage as an asset to your zone.
With this tool, Google Cloud Platform allows businesses to create data meshes by organizing their assets in a way that makes sense to them.
I hope this article has helped you to understand more about what a data mesh is and why it should be used by businesses, especially within a cloud environment. If you decide to implement your very own data mesh using cloud technologies, share your experience with us on Twitter and LinkedIn!