Data Warehousing with AWS Redshift
With AWS Glue, the information that was initially in a flat model can now be represented with a more fitting star schema in an information warehouse.
The cloud data warehouse for this data can be created with AWS Redshift Serverless. This entails making a namespace named flights-namespace
in addition to a database named dev
. As well as, it requires a workgroup named flights-workgroup
, which can be used to jot down SQL queries.
Note: The workgroup has been configured to permit devices outside of the VPC to access the database. This can be useful when creating the visualization with Power BI
Now, we are able to open the query editor in Redshift and begin creating the actual fact and dimension tables within the dev
database.
First, the 4 tables within the schema must be created within the warehouse using the next commands:
The 4 tables at the moment are in the information warehouse, but they’re all empty because the data continues to be within the flights-data-processed
bucket.
The information might be copied into this data warehouse using the COPY
command.
For example, the information in flights.csv
might be copied into the flights
table using the next command syntax:
Note: the
iam_role
variable must be assigned whatever iam role is was chosen when creating the workgroup.
By executing the COPY
command for every of the csv files within the flights-data-processed
bucket, the 4 tables must be crammed with the vital data.
For example, here’s a preview of the airport table: