We ran a $12K experiment to check the associated fee and performance of Serverless warehouses and dbt concurrent threads, and obtained unexpected results.
By: Jeff Chou, Stewart Bryson
Databricks’ SQL warehouse products are a compelling offering for firms seeking to streamline their production SQL queries and warehouses. Nonetheless, as usage scales up, the associated fee and performance of those systems turn into crucial to investigate.
On this blog we take a technical deep dive into the associated fee and performance of their serverless SQL warehouse product by utilizing the industry standard TPC-DI benchmark. We hope data engineers and data platform managers can use the outcomes presented here to make higher decisions on the subject of their data infrastructure decisions.
Before we dive into a selected product, let’s take a step back and take a look at different options available today. Databricks currently offers 3 different warehouse options:
- SQL Classic — Most elementary warehouse, runs inside customer’s cloud environment
- SQL Pro — Improved performance and good for exploratory data science, runs inside customer’s cloud environment
- SQL Serverless — “Best” performance, and the compute is fully managed by Databricks.
From a price perspective, each classic and pro run contained in the user’s cloud environment. What this implies is you’ll get 2 bills to your databricks usage — one is your pure Databricks cost (DBU’s) and the opposite is out of your cloud provider (e.g. AWS EC2 bill).
To essentially understand the associated fee comparison, let’s just take a look at an example cost breakdown of running on a Small warehouse based on their reported instance types:
Within the table above, we take a look at the associated fee comparison of on-demand vs. spot costs as well. You’ll be able to see from the table that the serverless option has no cloud component, since it’s all managed by Databricks.
Serverless might be cost effective in comparison with pro, when you were using all on-demand instances. But when there are low-cost spot nodes available, then Pro could also be cheaper. Overall, the pricing for serverless is pretty reasonable in my view because it also includes the cloud costs, even though it’s still a “premium” price.
We also included the equivalent jobs compute cluster, which is the most affordable option across the board. If cost is a priority to you, you possibly can run SQL queries in jobs compute as well!
The Databricks serverless option is a completely managed compute platform. That is just about equivalent to how Snowflake runs, where the entire compute details are hidden from users. At a high level there are pros and cons to this:
Pros:
- You don’t need to take into consideration instances or configurations
- Spin up time is way lower than initiating a cluster from scratch (5–10 seconds from our observations)
Cons:
- Enterprises could have security issues with the entire compute running inside Databricks
- Enterprises may not give you the chance to leverage their cloud contracts which could have special discounts on specific instances
- No ability to optimize the cluster, so that you don’t know if the instances and configurations picked by Databricks are literally good to your job
- The compute is a black box — users don’t know what is occurring or what changes Databricks is implementing underneath the hood which can make stability a difficulty.
Due to inherent black box nature of serverless, we were curious to explore the varied tunable parameters people do still have and their impact on performance. So let’s drive into what we explored:
We tried to take a “practical” approach to this study, and simulate what an actual company might do once they need to run a SQL warehouse. Since DBT is such a preferred tool in the fashionable data stack, we decided to have a look at 2 parameters to comb and evaluate:
- Warehouse size — [‘2X-Small’, ‘X-Small’, ‘Small’, ‘Medium’, ‘Large’, ‘X-Large’, ‘2X-Large’, ‘3X-Large’, ‘4X-Large’]
- DBT Threads — [‘4’, ‘8’, ‘16’, ‘24’, ‘32’, ‘40’, ‘48’]
The explanation why we picked these two is that they are each “universal” tuning parameters for any workload, they usually each impact the compute side of the job. DBT threads specifically effectively tune the parallelism of your job because it runs through your DAG.
The workload we chosen is the favored TPC-DI benchmark, with a scale factor of 1000. This workload specifically is interesting since it’s actually a whole pipeline which mimics more real-world data workloads. For instance, a screenshot of our DBT DAG is below, as you possibly can see it’s quite complicated and changing the variety of DBT threads could have an effect here.
As a side note, Databricks has a improbable open source repo that may help quickly arrange the TPC-DI benchmark inside Databricks entirely. (We didn’t use this since we’re running with DBT).
To get into the weeds of how we ran the experiment, we used Databricks Workflows with a Task Sort of dbt because the “runner” for the dbt CLI, and all the roles were executed concurrently; there ought to be no variance attributable to unknown environmental conditions on the Databricks side.
Each job spun up a brand new SQL warehouse and tore it down afterwards, and ran in unique schemas in the identical Unity Catalog. We used the Elementary dbt package to gather the execution results and ran a Python notebook at the top of every run to gather those metrics right into a centralized schema.
Costs were extracted via Databricks System Tables, specifically those for Billable Usage.
Do that experiment yourself and clone the Github repo here
Below are the associated fee and runtime vs. warehouse size graphs. We are able to see below that the runtime stops scaling while you get the medium sized warehouses. Anything larger than a medium just about had no impact on runtime (or perhaps were worse). This can be a typical scaling trend which shows that scaling cluster size just isn’t infinite, they all the time have some point at which adding more compute provides diminishing returns.
For the CS enthusiasts on the market, that is just the elemental CS principal — Amdahls Law.
One unusual statement is that the medium warehouse outperformed the subsequent 3 sizes up (large to 2xlarge). We repeated this particular data point a couple of times, and obtained consistent results so it just isn’t a wierd fluke. Due to black box nature of serverless, we unfortunately don’t know what’s happening under the hood and are unable to present an evidence.
Because scaling stops at medium, we are able to see in the associated fee graph below that the prices begin to skyrocket after the medium warehouse size, because well mainly you’re throwing costlier machines while the runtime stays constant. So, you’re paying for extra horsepower with zero profit.
The graph below shows the relative change in runtime as we modify the variety of threads and warehouse size. For values greater than the zero horizontal line, the runtime increased (a nasty thing).
The info here’s a bit noisy, but there are some interesting insights based on the dimensions of the warehouse:
- 2x-small — Increasing the variety of threads often made the job run longer.
- X-small to large — Increasing the variety of threads often helped make the job run about 10% faster, although the gains were pretty flat so continuing to extend thread count had no value.
- 2x-large — There was an actual optimal variety of threads, which was 24, as seen within the clear parabolic line
- 3x-large — had a really unusual spike in runtime with a thread count of 8, why? No clue.
To place the whole lot together into one comprehensive plot, we are able to see the plot below which plots the associated fee vs. duration of the full job. The various colours represent different warehouse sizes, and the dimensions of the bubbles are the variety of DBT threads.
Within the plot above we see the standard trend that larger warehouses typically result in shorter durations but higher costs. Nonetheless, we do spot a couple of unusual points:
- Medium is the perfect — From a pure cost and runtime perspective, medium is the perfect warehouse to decide on
- Impact of DBT threads — For the smaller warehouses, changing the variety of threads appeared to have modified the duration by about +/- 10%, but not the associated fee much. For larger warehouses, the variety of threads impacted each cost and runtime quite significantly.
In summary, our top 5 lessons learned about Databricks SQL serverless + DBT products are:
- Rules of thumbs are bad — We cannot simply depend on “rules of thumb” about warehouse size or the variety of dbt threads. Some expected trends do exist, but they are usually not consistent or predictable and it’s entirely depending on your workload and data.
- Huge variance — For the very same workloads the prices ranged from $5 — $45, and runtimes from 2 minutes to 90 minutes, all attributable to different combos of variety of threads and warehouse size.
- Serverless scaling has limits — Serverless warehouses don’t scale infinitely and eventually larger warehouses will stop to offer any speedup and only find yourself causing increased costs with no profit.
- Medium is great ?— We found the Medium Serverless SQL Warehouse outperformed lots of the larger warehouse sizes on each cost and job duration for the TPC-DI benchmark. We’ve no clue why.
- Jobs clusters could also be least expensive — If costs are a priority, switching to only standard jobs compute with notebooks could also be substantially cheaper
The outcomes reported here reveal that the performance of black box “serverless” systems can lead to some unusual anomalies. Because it’s all behind Databrick’s partitions, we’ve no idea what is going on. Perhaps it’s all running on giant Spark on Kubernetes clusters, possibly they’ve special deals with Amazon on certain instances? Either way, the unpredictable nature makes controlling cost and performance tricky.
Because each workload is exclusive across so many dimensions, we are able to’t depend on “rules of thumb”, or costly experiments which might be only true for a workload in its current state. The more chaotic nature of serverless system does beg the query if these systems need a closed loop control system to maintain them at bay?
As an introspective note — the business model of serverless is actually compelling. Assuming Databricks is a rational business and doesn’t need to decrease their revenue, they usually need to lower their costs, one must ask the query: “Is Databricks incentivized to enhance the compute under the hood?”
The issue is that this — in the event that they make serverless 2x faster, then suddenly their revenue from serverless drops by 50% — that’s a really bad day for Databricks. If they might make it 2x faster, after which increase the DBU costs by 2x to counteract the speedup, then they’d remain revenue neutral (that is what they did for Photon actually).
So Databricks is admittedly incentivized to diminish their internal costs while keeping customer runtimes in regards to the same. While that is great for Databricks, it’s difficult to pass on any serverless acceleration technology to the user that leads to a price reduction.
Eager about learning more about learn how to improve your Databricks pipelines? Reach out to Jeff Chou and the remaining of the Sync Team.