Home Artificial Intelligence From Data Lakes to Data Mesh: A Guide to the Latest Enterprise Data Architecture 1. A Temporary History of Data Lakes 2. The Data Lake Monster 3. Introducing…Data Mesh! 4. The best way to Construct a Data Mesh 5. Final Words My Popular AI & Data Science articles Unlimited Medium Access

From Data Lakes to Data Mesh: A Guide to the Latest Enterprise Data Architecture 1. A Temporary History of Data Lakes 2. The Data Lake Monster 3. Introducing…Data Mesh! 4. The best way to Construct a Data Mesh 5. Final Words My Popular AI & Data Science articles Unlimited Medium Access

0
From Data Lakes to Data Mesh: A Guide to the Latest Enterprise Data Architecture
1. A Temporary History of Data Lakes
2. The Data Lake Monster
3. Introducing…Data Mesh!
4. The best way to Construct a Data Mesh
5. Final Words
My Popular AI & Data Science articles
Unlimited Medium Access

Problem 3 — Fence-throwing

Dehghani calls the third and final mode of failure siloed and hyper-specialised ownership, which I wish to think as leading to unproductive fence-throwing.

Our hyper-specialised big data lake engineers working in the information lake are organisationally-siloed away from where the information originates and where it can be consumed.

Siloed hyper-specialised data platform team. Source: Z. Dehghani at MartinFowler.com (with permission)

This creates a poor incentive structure that doesn’t promote good delivery outcomes. Dehghani articulates this as…

“I personally don’t envy the lifetime of an information platform engineer. They should devour data from teams who haven’t any incentive in providing meaningful, truthful and proper data. They’ve little or no understanding of the source domains that generate the information and lack the domain expertise of their teams. They need to supply data for a various set of needs, operational or analytical, with no clear understanding of the applying of the information and access to the consuming domain’s experts.

What we discover are disconnected source teams, frustrated consumers fighting for a spot on top of the information platform team backlog and an over stretched data platform team.”

Data producers will ‘pack together’ a few of their data and throw it over the fence to the information engineers.

Your problem now! Good luck guys!

Overworked data engineers, who may or may not have done justice to the ingested data provided that they’re not data domain experts, will themselves throw some processed data out of the lake to serve downstream consumers.

Good luck, analysts and data scientists! Time for a fast nap after which I’m off to repair the fifty broken ETL pipelines on my backlog.

As you possibly can see from Problems 2 and three, the challenges which have arisen from the information lake experiment are as much organisational as technological.

Takeaways:

By federating data management to individual business domains, perhaps we could foster a culture of information ownership and collaboration and empower data producers, engineers and consumers to work together?

And hey, can we give these domains an actual stake in the sport?

Empower them to take pride in constructing strategic data assets by incentivising them to treat data like a hot-selling product?

In 2019, Dehghani proposed data mesh because the next-generation data architecture that embraces a decentralised approach to data management.

Her initial articles — here and here — generated significant interest within the enterprise data community that has since prompted many organisations worldwide to start their very own data mesh journey, including mine.

Fairly than pump data right into a centralised lake, data mesh federates data ownership and processing to domain-specific teams that control and deliver data as a product, promoting easy accessibility and interconnectivity of information across your complete organisation, enabling faster decision-making and promoting innovation.

Overview of information mesh. Source: Data Mesh Architecture (with permission)

The information mesh dream is to create a foundation for extracting value from analytical data at scale, with scale being applied to:

  • An ever-changing business, data and technology landscape.
  • Growth of information producers and consumers.
  • Varied data processing requirements. A diversity of use cases demand a diversity of tools for transformation and processing. As an illustration, real-time anomaly detection might leverage Apache Kafka; an NLP system for customer support often results in data science prototyping on Python packages like NLTK, image recognition leverages deep learning frameworks like TensorFlow & PyTorch; and the fraud detection team at my bank would like to process our big data with Apache Spark.

All these requirements have created technical debt for warehouses (in the shape of a mountain of unmaintainable ETL jobs) and a bottleneck for data lakes (resulting from the mountain of diverse work that’s squeezed through a small centralised data team).

Organisations eventually behold a threshold mountain of complexity where the technical debt outweigh the worth provided.

It’s a terrible situation.

To handle these problems, Dehghani proposed 4 principles that any data mesh implementation must embody in an effort to realise the promise of scale, quality and usefulness.

The 4 Principles of Data Mesh. Source: Data Mesh Architecture (with permission)
  1. Domain Ownership of Data: By placing data ownership within the hands of domain-specific teams, you empower those closest to the information to take charge. This approach enhances agility to changing business requirements and effectiveness in leveraging data-driven insights, which ultimately leads to higher and more modern services and products, faster.
  2. Data as a Product: Each business unit or domain is empowered to infuse product considering to craft, own and improve quality and reusable data products — a self-contained and accessible data set treated as a product by the information’s producers. The goal is to publish and share data products across the information mesh to consumers sitting in other domains — regarded as nodes on the mesh — in order that these strategic data assets could be leveraged by all. Read my Explainer 101 on data products.
  3. Self-Serve Data Platform: Empowering users with self-serve capabilities paves the way in which for accelerated data access and exploration. By providing a user-friendly platform equipped with the essential tools, resources, and services, you empower teams to change into self-sufficient of their data needs. This democratisation of information promotes faster decision-making and a culture of data-driven excellence.
  4. Federated Governance: Centralised control stifles innovation and hampers agility. A federated approach ensures that decision-making authority is distributed across teams, enabling them to make autonomous decisions when it counts. By striking the correct balance between control and autonomy, you foster accountability, collaboration and innovation.

Wondering how you can construct and deploy an information mesh? What does that appear to be?

For many organisations, the mesh won’t be a side-project you deploy once ready. In all likelihood, you’ll must cleverly federate your existing data lake piece-by-piece until you reach a platform that’s ‘sufficiently mesh’.

Think swapping out two aircraft engines for 4 smaller ones mid-flight, somewhat than buying a brand new plane in a pleasant shady hanger somewhere.

Or attempting to upgrade a road while attempting to keep some lanes open in any respect times to traffic, as an alternative of constructing a brand new road silo’ed away somewhere and opening it once every thing is nicely paved.

Full mesh maturity may take a protracted time, because data mesh is primarily an organisational construct. It’s as about operating models — in other words, people — because the technology itself, meaning cultural uplift and bringing people along for the journey is important.

Rest assured nevertheless — slowly but surely, your centralised domain-agnostic monolithic data lake will change into a decentralised domain-oriented modular data mesh.

Some considerations for the design phase. Try datamesh-architecture.com for a deeper dive.

  • Domains. A knowledge mesh architecture comprises a set of business domains, each with a domain data team who can perform cross-domain data evaluation on their very own. An enabling team — often a part of the transformation office of the organisation — spreads the concept of mesh across the organisation and function advocates. They assist individual domains on a consultancy basis on their journey to change into a ‘full member’ of the information mesh. The enabler team will comprise experts on data architecture, data analytics, data engineering and data governance.
  • Data products. Domains will ingest their very own operational data — which they sit very near and understand — and construct analytical data models as data products that could be published on the mesh. Data products are owned by the domain, who’s liable for its operations, quality and uplift during its entire lifecycle. Effective accountability to make sure effective data.
The sharing of information products across the mesh. Source: Data Mesh Architecture (with permission)
  • Self-serve. Remember those ‘multicultural food days’ in school, where everyone brought their delicious dishes and shared them at a self-serve table? The teacher’s minimalist role was to oversee operations and ensure every thing went easily. In an identical vein, mesh’s newly streamlined central data team endeavour to supply and maintain a domain-agnostic ‘buffet table’ of diverse data products from which to self-serve. Business teams can perform their very own evaluation with little overhead and offer up their very own data products to their peers. A delicious data feast where everyone will also be the chef.
  • Federated governance. Each domain will self-govern their very own data and be empowered to walk on the beat of its own drum — like European Union member states. On certain matters where it is smart to unite and standardise, they may strike agreements with other domains on global policies, akin to documentation standards, interoperability and security in a federated governance group — just like the European Parliament in order that individual domains can easily discover, understand, use and integrate data products available on the mesh.

Here’s the exciting bit — when will our mesh hit maturity?

The mesh emerges when teams start using other domain’s data products.

This serves as a useful benchmark to aim for to attest that your data mesh journey has reached a threshold level of maturity.

time to pop the champagne.

Data mesh is a comparatively latest idea, having only been invented around 2018 by architect Zhamek Dehghani.

It has gained significant momentum in the information architecture and analytics communities as an increasing variety of organisations grapple with the scalability problems of a centralised data lake.

By moving away from an organisational structure where data is controlled by a single team and towards a decentralised model where data is owned and managed by the teams that use it essentially the most, different parts of the organisation can work independently — with greater autonomy and agility — while still ensuring that the information is consistent, reliable and well-governed.

Data mesh promotes a culture of accountability, ownership and collaboration, where data is productised and treated as a first-class citizen that’s proudly shared across the corporate in a seamless and controlled manner.

The aim is attaining a very scalable and versatile data architecture that aligns with the needs of recent organisations where data is central to driving business value and innovation.

Summarising the 4 Principles of Data Mesh. Credit: Z. Dehghani at MartinFowler.com (with permission)

My company’s own journey towards data mesh is anticipated to take a few years for the principal migration, and longer for full maturity.

We’re working on three major parts concurrently:

  • Cloud. An uplift from our Cloudera stack on Microsoft Azure IaaS to native cloud services on Azure PaaS. More info here.
  • Data products. An initial array of foundational data products are being rolled out, which could be used and re-assembled in several mixtures like Lego bricks to form larger more useful data products.
  • Mesh. We’re decentralising our data lake to a goal state of not less than five nodes.

What a ride it has been. Once I began half a decade ago, we were just getting began constructing out our data lake using Apache Hadoop on top of on-prem infrastructure.

Countless challenges and invaluable lessons have shaped our journey.

Like several determined team, we fail fast and fail forward. Five short years later, now we have completely transformed our enterprise data landscape.

Who knows what things will appear to be in one other five years? I sit up for it.

Find me on Linkedin, Twitter & YouTube.

  • AI Revolution: Fast-paced Intro to Machine Learning — here
  • ChatGPT & GPT-4: How OpenAI Won the NLU War — here
  • Generative AI Art: Midjourney & Stable Diffusion Explained — here
  • Power of Data Storytelling — Sell Stories, Not Data — here
  • Data Warehouses & Data Modelling — a Quick Crash Course — here
  • From Data Warehouses & Data Lakes to Data Mesh — here
  • From Data Lakes to Data Mesh: A Guide to Latest Architecture — here
  • Data Products: Constructing a Strong Foundation for Analytics — here
  • Cloud Computing 101: Harness Cloud for Your Business — here
  • Power BI — From Data Modelling to Stunning Reports — here
  • Machine Learning versus Mechanistic Modelling — here
  • Popular Machine Learning Performance Metrics Explained — here
  • Way forward for Work: Is Your Profession Protected in Age of AI — here
  • Beyond ChatGPT: Seek for a Truly Intelligence Machine — here
  • Regression: Predict House Prices using Python — here
  • Classification: Predict Worker Churn using Python — here
  • Python Jupyter Notebooks versus Dataiku DSS — here

Join Medium here and luxuriate in unlimited access to the very best articles on the web.

You can be directly supporting myself and other top writers. Cheers!

LEAVE A REPLY

Please enter your comment!
Please enter your name here