Home Artificial Intelligence A Complete Guide to Effectively Scale Your Data Pipelines and Data Products with Contract Testing and dbt But… what’s a contract test? Implementing our first contract test Implementing contract tests for marts and output ports Running the contract tests within the pipeline Additional tricks to start Conclusion

A Complete Guide to Effectively Scale Your Data Pipelines and Data Products with Contract Testing and dbt But… what’s a contract test? Implementing our first contract test Implementing contract tests for marts and output ports Running the contract tests within the pipeline Additional tricks to start Conclusion

0
A Complete Guide to Effectively Scale Your Data Pipelines and Data Products with Contract Testing and dbt
But… what’s a contract test?
Implementing our first contract test
Implementing contract tests for marts and output ports
Running the contract tests within the pipeline
Additional tricks to start
Conclusion

First, we’d like so as to add two latest dbt packages, dbt-expectations and dbt-utils, that can allow us to make assertions on the schema of our sources and the accepted values.

# packages.yml

packages:
- package: dbt-labs/dbt_utils
version: 1.1.1

- package: calogica/dbt_expectations
version: 0.8.5

Testing the info sources

Let’s start by defining a contract test for our first source. We pull data from raw_height, a table that accommodates height information from the users of the gym app.

We agree with our data producers that we are going to receive the peak measurement, the units for the measurements, and the user ID. We agree on the info types and that only ‘cm’ and ‘inches’ are supported as units. With all this, we are able to define our first contract within the dbt source YAML file.

The constructing blocks

the previous test, we are able to see several of the dbt-unit-testing macros in use:

  • dbt_expectations.expect_column_values_to_be_of_type: This assertion allows us to define the expected column data type.
  • accepted_values: This assertion allows us to define an inventory of the accepeted values for a particular column.
  • dbt_utils.accepted_range: This assertion allows us to define a numerical range for a given column. In the instance, we expected the column’s value to not be lower than 0.
  • not null: Finally, built-in assertions like ‘not null’ allow us to define column constraints.

Using these constructing blocks, we added several tests to define the contract expectations described above. Notice also how we’ve got tagged the tests as “contract-test-source”. This tag allows us to run all contract tests in isolation, each locally, and as we are going to see later, within the CI/CD pipeline:

dbt test --select tag:contract-test-source

We now have seen how quickly we are able to create contract tests for the sources of our dbt app, but what in regards to the public interfaces of our data pipeline or data product?

As data producers, we would like to be sure that we’re producing data in response to the expectations of our data consumers so we are able to satisfy the contract we’ve got with them and make our data pipeline or data product trustworthy and reliable.

An easy method to make sure that we’re meeting our obligations to our data consumers is so as to add contract testing for our public interfaces.

Dbt recently released a brand new feature for SQL models, model contracts, that enables to define the contract for a dbt model. While constructing your model, dbt will confirm that your model’s transformation will produce a dataset matching up with its contract, or it should fail to construct.

Let’s see it in motion. Our mart, body_mass_indexes, produces a BMI metric from the burden and height measure data we get from our sources. The contract with our provider establishes the next:

  • Data types for every column.
  • User IDs can’t be null
  • User IDs are all the time greater than 0

Let’s define the contract of the body_mass_indexes model using dbt model contracts:

The constructing blocks

the previous model specification file, we are able to see several metadata that allow us to define the contract.

  • contract.enforced: This configuration tells dbt that we would like to implement the contract each time the model is run.
  • data_type: This assertion allows us to define the column type we expect to supply once the model runs.
  • constraints: Finally, the constraints block gives us the possibility to define useful constraints like that a column can’t be null, set primary keys, and custom expressions. In the instance above we defined a constraint to inform dbt that the user_id must all the time be greater than 0. You’ll be able to see all of the available constraints here.

A difference between the contract tests we defined for our sources and those defined for our marts or output ports is when the contracts are verified an enforced.

Model contracts are enforced when the model is being generated by dbt run, whereas contracts based on the dbt tests are enforced when the dbt tests run.

If one in all the model contracts just isn’t satisfied, you will notice an error while you execute ‘dbt run’ with specific details on the failure. You’ll be able to see an example in the next dbt run console output.

1 of 4 START sql table model dbt_testing_example.stg_gym_app__height ........... [RUN]
2 of 4 START sql table model dbt_testing_example.stg_gym_app__weight ........... [RUN]
2 of 4 OK created sql table model dbt_testing_example.stg_gym_app__weight ...... [SELECT 4 in 0.88s]
1 of 4 OK created sql table model dbt_testing_example.stg_gym_app__height ...... [SELECT 4 in 0.92s]
3 of 4 START sql table model dbt_testing_example.int_weight_measurements_with_latest_height [RUN]
3 of 4 OK created sql table model dbt_testing_example.int_weight_measurements_with_latest_height [SELECT 4 in 0.96s]
4 of 4 START sql table model dbt_testing_example.body_mass_indexes ............. [RUN]
4 of 4 ERROR creating sql table model dbt_testing_example.body_mass_indexes .... [ERROR in 0.77s]

Finished running 4 table models in 0 hours 0 minutes and 6.28 seconds (6.28s).

Accomplished with 1 error and 0 warnings:

Database Error in model body_mass_indexes (models/marts/body_mass_indexes.sql)
latest row for relation "body_mass_indexes__dbt_tmp" violates check constraint
"body_mass_indexes__dbt_tmp_user_id_check1"
DETAIL: Failing row accommodates (1, 2009-07-01, 82.5, null, null).
compiled Code at goal/run/dbt_testing_example/models/marts/body_mass_indexes.sql

Until now we’ve got a test suite of powerful contract tests, but how and when will we run them?

We are able to run contract tests in two varieties of pipelines.

  • CI/CD pipelines
  • Data pipelines

For instance, you may execute the source contract tests on a schedule in a CI/CD pipeline targeting the info sources available in lower environments like test or staging. You’ll be able to set the pipeline to fail each time the contract just isn’t met.

These failures provides worthwhile details about contract-breaking changes introduced by other teams before these changes reach production.

LEAVE A REPLY

Please enter your comment!
Please enter your name here