
The symbiotic relationship between data governance and AI

Generative AI has already began shaking the world of Data Governance, and it is about to maintain doing so.
It’s just been 6 months since ChatGPT’s release, however it seems like we want a retrospective already. On this piece, I’ll explore how generative AI is impacting data governance, and where it’s prone to take us within the near future. Let me emphasize near because things evolve quickly, and so they can go loads of other ways. This text isn’t about forecasting the subsequent 100 years of information governance, but quite a practical take a look at the changes happening now and people just on the horizon.
Before diving in, let’s remind ourselves of what data governance deals with.
Keeping things easy, data governance is the algorithm or processes that a company follows to make sure the information is trustworthy. It involves 5 key areas:
- Metadata and Documentation
- Search and Discovery
- Policies and Standards
- Data Privacy and Security
- Data Quality
On this piece, we’ll take a look at how each of those areas is about to evolve once we incorporate generative AI in the combo.
Let’s do that!
Metadata and documentation might be an important part of information governance, and the opposite parts construct heavily of this one being done properly. AI has already began, and can proceed to vary the way in which we create data context. But I dont need to get your hopes too high. We still need humans within the loop in relation to documentation.
Producing context around data, or documenting the information has two parts. The primary element, which makes up about 70% of the job, involves documenting general information, common for a lot of corporations. A really basic example is the definition of “email” which is common to all corporations. The second part is about writing down the particular know-how that’s unique to your organization.
Here’s the exciting part: AI can do loads of the heavy lifting for the primary 70%. It’s because the primary element involves general knowledge, and generative AI is great at handling that.
Now, what about knowledge that’s peculiar to your organization? Every organization is exclusive, and this uniqueness gives rise to your individual specific company language. This language is your metrics, KPIs, and business definitions. And it isn’t something that might be imported from outside. It’s born from the individuals who know the business best = its employees.
In my conversations with data leaders, I often discuss the right way to create a shared understanding of those business concepts. Many leaders share that to realize this alignment, they carry domain teams in the identical room to speak, debate, and agree upon the definitions that best fit their business model.
Let’s take, for instance, the definition of a ‘customer.’ For a subscription-based business, a customer may very well be someone who’s currently subscribed to their service. But for a retail business, a customer could be anyone who’s made a purchase order within the last 12 months. Each company defines ‘customer’ in a way that makes essentially the most sense for them, and this understanding normally emerges from throughout the organization.
In terms of such peculiar knowledge, AI, as smart because it is, can’t do that part just yet. It may’t sit in in your meetings, join within the discussion, or help latest concepts bloom. For Andreessen Horowitz, this might grow to be possible when the second wave of AI hits. For now, we’re still at wave 1.
I’d also prefer to touch on a matter posed by Benn Stancil. Benn asks: If a bot can write data documentation on demand for us, what’s the purpose of writing it down in any respect?
There may be some truth to this: if generative AI can generate content on demand, why not only generate it while you need it, as an alternative of bothering with documenting every little thing? Unfortunately, it doesn’t work like this, for 2 reasons.
First, as I’ve previously explained, a component of documentation covers the unique elements of an organization that AI cannot capture yet. This calls for human expertise. It can’t be generated on the fly by AI.
Second, while AI is advanced, it’s not infallible. The information it generates isn’t all the time accurate. You could be certain a human checks and confirms all AI-produced content.
Generative AI just isn’t just changing the way in which we create documentation but in addition how we eat it. Actually, we’re witnessing a paradigm shift in search and discovery methods. The standard methods, where analysts search through your data catalog looking for out relevant information, are quickly becoming outdated.
A real game changer lies in AI’s ability to grow to be a personal data assistant to everyone in the corporate. In some data catalogs, you may already approach the AI together with your specific data inquiries. You may ask questions corresponding to, “Is it possible to perform motion X with the information?”, “Why am I unable to make use of the information to realize Y?”, or “Can we possess data that illustrates Z?”. In case your data is enriched with the best context, AI will help disseminate this context across the entire company.
One other development we’re expecting is that AI will transform the information catalog from a passive entity to an energetic helper. Give it some thought this fashion: for those who’re using a formula incorrectly, the AI assistant could offer you a heads-up. Likewise, for those who’re about to jot down a question that already exists, the AI could let you understand and guide you to the present piece of labor.
Up to now, data catalogs just sat there, waiting so that you can sift through them for answers. But with AI, catalogs could start actively helping you, offering insights and solutions before you even realize you would like them. This could be complete shift in how we engage with data, and it could be happening very soon.
Yet, there may be a condition for the AI assistant to work effectively: your data catalog have to be maintained. To be sure that the AI assistant provides reliable guidance to stakeholders, the underlying documentation have to be 100% trustworthy. If the catalog just isn’t properly maintained, or if the policies aren’t clearly defined, then the AI assistant will spread misinformation throughout the corporate. This could be more detrimental than having no information in any respect, because it could lead on to poor decision-making based on the unsuitable context.
You’ve probably understood it: AI and data governance are interdependent. AI can enhance data governance, but in turn, robust data governance is required to fuel the capabilities of AI. This leads to a virtuous cycle where each component boosts the opposite. But it is advisable to bear in mind that no element can replace the opposite.
One other key component of information governance is the formulation and implementation of governance rules.
This normally involves defining data ownership and domains throughout the organization. Right away, AI isn’t as much as the duty in relation to defining these policies and standards. AI shines in relation to executing rules or flagging infractions, but it’s lacking when tasked with creating the principles themselves.
That is for an easy reason. Defining ownership and domains pertains to human politics. For instance, ownership means deciding who throughout the organization has the authority over specific datasets. This might include the facility to make decisions about how and when the information is used, who has access to it, and the way it’s maintained and secured. Making these decisions often involves negotiating between individuals, teams, or departments, each with their very own interests and perspectives. And human politic, for obvious reasons, cannot get replaced by AI.
We thus expect that humans will proceed to play a major role on this aspect of governance within the near future. Generative AI can play a task in drafting an ownership framework or suggesting data domains. Nonetheless, keeping humans within the loop still stays a must.
Nonetheless, generative AI is about to shake things up within the privacy department of governance. Managing privacy rights is a historically feared aspect of governance. No one enjoys it. It involves manually creating a posh architecture of permissions to be certain sensitive data is protected.
The excellent news is: AI can automate much of this process. Given parameters corresponding to the variety of users and their respective roles, AI can create rules for access rights. The architectural aspect of access rights, being fundamentally code-based, aligns well with AI’s capabilities. The AI system can process these parameters, generate relevant code, and apply it to administer data access efficiently.
One other area where AI could make a big effect is within the management of Personally Identifiable Information (PII). Today, PII tagging is frequently done manually, making it a burden for the person in control of it. That is something AI can automate completely. By leveraging AI’s pattern recognition capabilities, PII tagging might be conducted more accurately than when it’s done by a human. On this sense, using AI could actually improve the way in which we we manage privacy protection.
This doesn’t imply that AI will completely replace human involvement. Despite AI’s capabilities, we still need human oversight to administer unexpected situations and make judgment calls when needed.
Let’s not ignore data quality, which is a vital pillar of governance. Data quality ensures that the data utilized by an organization is accurate, consistent, and reliable. Maintaining data quality has all the time been a posh endeavor, but things are already changing with generative AI.
As I discussed above, AI is great at applying rules and flagging infractions. This makes it easy for algorithms to discover anomalies in the information. Yow will discover an in depth account on how AI affects different elements of information quality in this text.
AI may lower the technical barrier of information quality. That is something SODA is already putting in. Their latest tool, SodaGPT, offers a no-code approach to specific data quality checks, enabling users to perform quality checks using natural language alone. This enables data quality maintenance to grow to be way more intuitive and accessible.
We’ve seen that AI can supercharge Data Governance in a way that’s triggering the start of a paradigm shift. A whole lot of changes are already happening, and so they are here to remain.
Nonetheless, AI can only construct on a foundation that’s already solid. For AI to vary the search and discovery experience in your organization, you should already be maintaining your documentation. AI is powerful, but it will probably’t miraculously mend a system that’s flawed.
The second point to bear in mind is that even when AI might be used to generate many of the context around data, it cannot replace the human element entirely. we still need humans within the loop for validation and for documenting the knowledge unique to every company. So our one sentence prediction for the long run of governance: turbocharged by AI, anchored in human discernment and cognition.
At CastorDoc, we’re constructing a knowledge documentation tool for the Notion, Figma, Slack generation.
Want to examine it out? Reach out to us and we are going to show you a demo.