Optimize your data science workflow by automating matplotlib output — with 1 line of code. Here’s how.
Naming things is tough. After an extended enough day, we’ve all ended up with the highly-descriptive likes of “graph7(1)_FINAL(2).png
” and “output.pdf
” Look familiar?
We are able to do higher — and quite easily, actually.
Once we use data-oriented “seaborn-esque” plotting mechanisms, the ingredients for a descriptive filename are all there. A typical call looks like this,
sns.scatterplot(data=suggestions, x="total_bill", y="tip", hue="time")
Right there we all know we’ve got “total_bill
” on the x axis, “time
” color coded, etc. So what if we used the plotting function name and people semantic column keys to prepare the output for us?
Here’s what that workflow looks like, using the teeplot tool.
import seaborn as sns; import teeplot as tp
tp.save = {".eps": True, ".pdf": True} # set custom output behavior
tp.tee(sns.scatterplot,
data=sns.load_data("suggestions"), x="total_bill", y="tip", hue="time")
teeplots/hue=time+viz=scatterplot+x=total-bill+y=tip+ext=.eps
teeplots/hue=time+viz=scatterplot+x=total-bill+y=tip+ext=.pdf
We’ve actually done three things in this instance — 1) we rendered the plot within the notebook and 2) we’ve saved our visualization to file with a meaningful filename and 3) we’ve hooked our visualization right into a framework where notebook outputs might be managed at a world level (on this case, enabling eps/pdf output).
This text will explain how one can harness the teeplot Python package to improve organized and release your mental workload to deal with more interesting things.
I’m the first writer and maintainer of the project, which I actually have utilized in my very own workflow for several years and located useful enough to package and share more widely with the community. teeplot is open source under the MIT license.
teeplot is designed to simplify work with data visualizations created with libraries like matplotlib, seaborn, and pandas. It acts as a wrapper around your plotting calls to handle output management for you.
Here’s how one can use teeplot in 3 steps,
- Select Your Plotting Function: Start by choosing your selected plotting function, whether it’s from matplotlib, seaborn, pandas, etc. or one you wrote yourself.
- Add Your Plotting Arguments: Pass your plotting function as the primary argument to
tee
, followed by the arguments you desire to use in your visualization. - Automatic Plotting and Saving: teeplot captures your plotting function and its arguments, executes the plot, after which takes care of wrangling the plot outputs for you.
That’s it!
Next, let’s have a look at 3 temporary examples that show: a) basic use, b) custom post-processing, and c) custom plotting functions.
In this instance, we pass a DataFrame df
’s member function df.plot.box
as our plotter and two semantic keys: “age” and “gender.” teeplot takes care of the remainder.
# adapted pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.box.html
import pandas as pd; from teeplot import teeplot as tpage_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
tp.tee(df.plot.box, # plotter...
column="age", by="gender", figsize=(4, 3)) # ...forwa
teeplots/by=gender+column=age+viz=box+ext=.pdf
teeplots/by=gender+column=age+viz=box+ext=.png