Automate resource provisioning with modern tools

19 hours ago
Modern data stacks consist of varied tools and frameworks to process data. Typically it will be a big collection of various cloud resources aimed to rework the information and convey it to the state where we will generate data insights. Managing the multitudes of those data processing resources just isn’t a trivial task and might sound overwhelming. The great thing is that data engineers invented an answer called infrastructure as code. So essentially it’s coding that helps us to deploy, provision and manage all resources we’d ever need in our data pipelines. On this story, I would love to debate popular techniques and existing frameworks that aim to simplify resource provisioning and data pipeline deployments. I remember how on the very starting of my data profession I deployed data resources using the online user interface, i.e. storage buckets, security roles, etc. Those days are long gone but I still remember the enjoyment and happiness after I learned that it may very well be done programmatically using templates and code.
Modern data Stacks
What would that be — a Modern Data Stack (MDS)? The technologies which are specifically used to organise, store, and manipulate data can be something that makes up a contemporary data stack [1]. That is what helps to shape the trendy and successful data platform. I remember I raised this discussion in one in every of the previous stories.
A simplified data platform blueprint often looks like this:
It often comprises dozens of various data sources and cloud platform resources to process them.
There may be different data platform architecture types depending on business and functional requirements, skillset of our users, etc. but typically infrastructure design goes in several data processing…