Your start towards data mesh – using Snowflake as an example
When operating a data warehouse or a data lake (or both), companies regularly face challenges due to increasing data volumes and increasing requirements. I covered the typical challenges and how the data mesh approach addresses them in the previous one Blog post described.
Once a company has decided that the data mesh approach is the right one, the next question is how to start implementing it. I will answer this question below. Based on the core elements of a data mesh described in part 1, this second part shows how the core elements can be implemented specifically. In the course of this, I will use an example to show how an existing DWH based on Snowflake can be used to start with a data mesh.
Core element 1: Redistribution of responsibility
In order to redistribute responsibility in the spirit of decentralization, it is advantageous if the organization has already made a similar move during application development.
Because if domain-driven design and DevOps are already established, it is easier to carry out this step with the data. If this is not the case, an appropriate organizational and cultural change should first be carried out before the introduction of a data mesh begins.
When introducing a data mesh, it makes sense to introduce new roles in the domains in order to create clear responsibilities for the domain's data products. Zhamak Dehghani suggests in your book1 the following two roles:
- The data product owner, who is responsible for the cut of the data products as well as the applicable governance rules and quality requirements and also records, structures and prioritizes the requirements for new data products or adjustments to them. For this, knowledge in the areas of data security, data protection, data governance and data quality is just as important as experience as a product owner.
- The Data Product Developer is responsible for the development, operation and maintenance of the data products. Someone with this role should have skills in the areas of software development, DevOps, but also data engineering, data modeling and analysis using machine learning methods.
The data-related knowledge, skills and experience listed for the roles are usually not available in the domains, but are available in the previous central team that was previously responsible for integrating data into the data warehouse. As an enabler team, this team can temporarily support the domains and enable them to take responsibility for their data products in the long term. Accompanying learning opportunities should be available to provide both roles with the necessary knowledge.
Core element 2: Product thinking
The development of the data skills listed above is also necessary for the second core element, product thinking. The establishment of an enabler team, as described above, as well as accompanying learning opportunities are therefore also an important building block for creating the product mindset.
In order to keep the consumer and their needs in mind, the possibility should also be created for data product owners and consumers to exchange ideas with each other. This can take place bilaterally between two domains by planning new data products, but it also shows that additional meeting formats in which all domains come together make sense.
In addition, there is a need for a platform that supports the domains in bundling the various components of a data product in a uniform form with as much automation as possible and also provides the data product owners with insights into how consumers use and evaluate their data products in order to provide consumer orientation to continually improve. This brings us directly to core element 3, the platform architecture.
Core element 3: Platform architecture
There are various tools that can be used to set up a data mesh platform. Since data lake or data warehouse architectures already exist in most companies, I would like to present an example of how an existing architecture can be used to start with a data mesh.
For example, there are some companies that already use Snowflake as a cloud data warehouse. Because Snowflake is already a distributed environment, it is possible for a company's many domains to develop and make their data products available on this platform.
With the Snowgrid, Snowflake works with the various cloud platforms (e.g. Amazon Web Services, Google, Azure) and it is not necessary to commit to a cloud platform within the organization, but the domains can independently choose the one that suits them Choose cloud provider. The Snowgrid ensures that company-wide governance rules are implemented on all cloud platforms used. This feature takes into account the redistribution of responsibility by giving domains full control and responsibility over their data products and the tools used for development. On the other hand, federal, automated data governance is also guaranteed with the Snowgrid.
With Snowflake, data products can largely be created self-service. Typically, the data of a data product is provided using one schema per data product. The associated code of the data product (for pipelines, transformation logic and also policies) can also be referenced in this schema. SLOs are also requested by Snowflake when creating the data product. For the input and output ports of a data product, Snowflake offers numerous connectors that can be used in the development of the data product (Kafka, Spark, Dataframe API, Cloud Storage Buckets, SQL API, JDBC, REST API, .NET, various file formats).
As a further feature, Snowflake includes marketplace functions so that data products can be searched for and found in the data mesh. It is also possible to connect a third-party data catalog.
Snowflake also interacts with various other tools that can be used to build a data mesh platform, so that with an existing Snowflake data warehouse you have very good conditions on which to build in order to take the first steps towards data mesh .
Finally, it should be mentioned that all three core elements should be considered in combination in order to follow the four principles of a data mesh (see blog post 1). If a company focuses too much on one or two elements and ignores the others, it will be difficult to correct this afterwards.
Sources for further information:
Dr. Saskia-Janina Untiet-Kepp