Home Artificial Intelligence Creating an Infographic With Matplotlib The Goal of the Infographic Importing Libraries and Loading Data Preparing and Creating the Infographic with Matplotlib Creating the Infographic with Matplotlib Full Code for Infographic Summary Dataset Utilized in this Tutorial

Creating an Infographic With Matplotlib The Goal of the Infographic Importing Libraries and Loading Data Preparing and Creating the Infographic with Matplotlib Creating the Infographic with Matplotlib Full Code for Infographic Summary Dataset Utilized in this Tutorial

0
Creating an Infographic With Matplotlib
The Goal of the Infographic
Importing Libraries and Loading Data
Preparing and Creating the Infographic with Matplotlib
Creating the Infographic with Matplotlib
Full Code for Infographic
Summary
Dataset Utilized in this Tutorial

Geological Lithology Variations Inside The Zechstein Group of the Norwegian Continental Shelf

Towards Data Science
Radial bar plots of lithology variation across the Norwegian Continental Shelf. Image by the creator.

Creating exciting and compelling data visualisations is important to working with data and being a knowledge scientist. It allows us to offer information to readers in a concise form that helps the reader(s) understand data without them having to view the raw data values. Moreover, we will use charts and graphs to inform a compelling and interesting story that answers a number of questions on the information.

Inside the Python world, there are many libraries that allow data scientists to create visualisations and one among the primary that many come across when starting their data science journey is matplotlib. Nevertheless, after working with matplotlib for a little bit while, many individuals turn to other more modern libraries as they view the fundamental matplotlib plots as boring and basic.

With a little bit of time, effort, code, and an understanding of matplotlib’s capabilities, we will transform the fundamental and boring plots into something far more compelling and visually appealing.

In my past several articles, I even have focused on how we will transform individual plots with various styling methods. If you would like to explore improving matplotlib data visualisations further, you’ll be able to try a few of my previous articles below:

These articles have mainly focused on single plots and styling them. Inside this text, we’re going to take a look at constructing infographics with matplotlib.

Infographics are used to remodel complex datasets into compelling visual narratives which are informative and interesting for the reader. They visually represent data and consist of charts, tables and minimal text. Combining these allows us to offer an easy-to-understand overview of a subject or query.

After sharing my previous article on Polar Bar charts, I used to be tagged in a tweet from Russell Forbes, showing that it is feasible to make infographics inside matplotlib.

So, based on that, I assumed to myself, why not try constructing an infographic with matplotlib.

And I did.

The next infographic was the results of that, and it’s what we will probably be recreating in this text.

Example infographic that will be created using matplotlib. Image by the creator.

Keep in mind that the infographic we will probably be constructing in this text could also be suitable for web use or included inside a presentation. Nevertheless, if we were trying to include these inside reports or display them in additional formal settings, we will want to consider alternative color palettes and a more skilled feel.

Before we touch any data visualisation, we’d like to know the aim behind creating our infographic. Without this, it’s going to be difficult to narrow down the plots we would like to make use of and the story we would like to inform.

For this instance, we’re going to use a set of well log derived lithology measurements which have been obtained from the Norwegian Continental Shelf. From this dataset, we’re going to specifically have a look at the query:

What’s the lithological variation of the Zechstein Group inside this dataset?

This provides us with our place to begin.

We all know that we’re on the lookout for lithology data and data throughout the Zechstein Group.

To start, we first have to import numerous key libraries.

These are pandas, for loading and storing our data, numpy for performing mathematical calculations to permit us to plot labels and data in a polar projections, matplotlib for creating our plot, and adjustText to make sure labels don’t overlap on our scatter plot.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from adjustText import adjust_text

After the libraries have been imported, we next have to load our datasets. Details of the source for this dataset is included at the underside of this text.

The primary dataset we are going to load is the lithology composition of the Zechstein Group created in my previous article.

We are able to load this data in using pandas read_csv() function.

df = pd.read_csv('Data/LithologySummary.csv', index_col='WELL')

After we view our dataframe we now have the next information in regards to the lithologies present throughout the Zechstein Group as interpreted inside each well.

Pandas dataframe containing lithology composition for eight wells which have penetrated the Zechstein Group. Image by the creator.

To assist our readers understand the information higher, it will be good to have details about where the drilled wells intersected with the Zechstein Group.

We are able to load this data in the identical way through the use of pd.read_csv(). Nevertheless, this time, we don’t have to set an index.

zechstein_well_intersections = pd.read_csv('Data/Zechstein_WellIntersection.csv')

After we view this dataframe we’re presented with the next table containing the well name, the X & Y grid locations of where the well penetrated the Zechstein Group.

Pandas dataframe of the X & Y grid locations of where wells have penetrated the Zechstein Group.

Before we start creating any figures, we’d like to create a couple of variables containing key details about our data. This may make things easier on the subject of making the plots.

First, we are going to get a listing of the entire possible lithologies. This is finished by converting the column names inside our summary dataframe to a listing.

lith_names = list(df.columns)

After we view this list, we get back the next lithologies.

Next, we’d like to choose how we would like the person plots throughout the infographic to be arrange.

For this dataset, we now have 8 wells, which will probably be used to generate 8 radial bar charts.

We also want to indicate well locations on the identical figure as well. So this offers us 9 subplots.

A method we will subdivide our figure is to have 3 columns and three rows. This permits us to create our first variable, num_cols representing the variety of columns.

We are able to then generalise the variety of rows ( num_rows ) variable in order that we will reuse it with other datasets. In this instance, it’s going to take the variety of wells we now have (the variety of rows within the dataframe) and divide it by the variety of columns we would like. Using np.ceil will allow us to round this number up in order that we now have the entire plots on the figure.

# Set the variety of columns to your subplot grid
num_cols = 3

# Get the variety of wells (rows within the DataFrame)
num_wells = len(df)

# Calculate the variety of rows needed for the subplot grid
num_rows = np.ceil(num_wells / num_cols).astype(int)

The subsequent set of variables we’d like to declare are as follows:

  • indexes : creates a listing of numbers starting from 0 to the full variety of items in our list. In our case, it will generate a listing from 0 to 7, which covers the 8 lithologies in our dataset.
  • width : creates a listing based on calculating the width of every bar within the chart by dividing the circumference of a circle by the variety of rock types we now have in rock_names
  • angles : creates a listing containing the angles for every of the rock types
  • colors : a listing of hexadecimal colors we would like to make use of to represent each well
  • label_loc : creates a listing of evenly spaced values between 0 and a couple of * pi for displaying the rock-type labels
indexes = list(range(0, len(lith_names)))
width = 2*np.pi / len(lith_names)
angles = [element * width for element in indexes]

colors = ["#ae1241", "#5ba8f7", "#c6a000", "#0050ae",
"#9b54f3", "#ff7d67", "#dbc227", "#008c5c"]

label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(lith_names))

Adding Radial Bar Charts as Subplots

To start creating our infographic, we first have to create a figure object. This is finished by calling upon plt.figure().

To setup our figure, we’d like to pass in a couple of parameters:

  • figsize : controls the dimensions of the infographic. As we can have various numbers of rows, we will set the rows parameter to be a multiple of the variety of rows. This may prevent the plots and figures from becoming distorted.
  • linewidth : controls the border thickness for the figure
  • edgecolor : sets the border color
  • facecolor : sets the figure background color
# Create a figure
fig = plt.figure(figsize=(20, num_rows * 7), linewidth=10,
edgecolor='#393d5c',
facecolor='#25253c')

Next, we’d like to define our grid layout. There are a couple of ways we will do that, but for this instance, we’re going to use GridSpec. This may allow us to specify the situation of the subplots, and in addition the spacing between them.

# Create a grid layout
grid = plt.GridSpec(num_rows, num_cols, wspace=0.5, hspace=0.5)

We at the moment are able to begin adding our radial bar plots.

To do that, we’d like to loop over each row throughout the lithology composition summary dataframe and add an axis to the grid using add_subplot() As we’re plotting radial bar charts, we would like to set the projection parameter to polar.

Next, we will begin adding our data to the plot by calling upon ax.bar. Inside this call, we pass in:

  • angles : provides the situation of the bar within the polar projection and can also be used to position the lithology labels
  • height : uses the share values for the present row to set the peak of every bar
  • width : used to set the width of the bar
  • edgecolor : sets the sting color of the radial bars
  • zorder : used to set the plotting order of the bars on the figure. On this case it is about to 2, in order that it sits in the highest layer of the figure
  • alpha : used to set the transparency of the bars
  • color : sets the color of the bar based on the colors list defined earlier

We then repeat the means of adding bars with the intention to add a background fill to the radial bar plot. As an alternative of setting the peak to a worth from the table, we will set it to 100 in order that it fills your complete area.

The subsequent a part of the set involves organising the labels, subplot titles, and grid colors.

For the lithology labels, we’d like to create a for loop that may allow us to position the labels at the right angle across the fringe of the polar plot.

Inside this loop, we’d like to envision what the present angle is throughout the loop. If the angle of the bar is lower than pi, then 90 degrees is subtracted from the rotation angle. Otherwise, if the bar is in the underside half of the circle, 90 degrees is added to the rotation angle. This may allow the labels on the left and right-hand sides of the plot to be easily read.

# Loop over each row within the DataFrame
for i, (index, row) in enumerate(df.iterrows()):
ax = fig.add_subplot(grid[i // num_cols, i % num_cols], projection='polar')

bars = ax.bar(x=angles, height=row.values, width=width,
edgecolor='white', zorder=2, alpha=0.8, color=colors[i])

bars_bg = ax.bar(x=angles, height=100, width=width, color='#393d5c',
edgecolor='#25253c', zorder=1)

ax.set_title(index, pad=35, fontsize=22, fontweight='daring', color='white')
ax.set_ylim(0, 100)
ax.set_yticklabels([])
ax.set_xticks([])
ax.grid(color='#25253c')
for angle, height, lith_name in zip(angles, row.values, lith_names):
rotation_angle = np.degrees(angle)
if angle < np.pi:
rotation_angle -= 90
elif angle == np.pi:
rotation_angle -= 90
else:
rotation_angle += 90
ax.text(angle, 110, lith_name.upper(),
ha='center', va='center',
rotation=rotation_angle, rotation_mode='anchor', fontsize=12,
fontweight='daring', color='white')

After we run the code at this point, we get back the next image containing all 8 wells.

Matplotlib figure with radial bar charts displaying lithology percentages for 8 wells from the Norwegian Continental Shelf. Image by the creator.

Adding a Scatter Plot as a Subplot

As you’ll be able to see above, we now have a niche throughout the figure in the underside right. That is where we are going to place our scatter plot showing the locations of the wells.

To do that, we will add a brand new subplot outside of the for loop. As we would like this to be the last plot on our figure, we’d like to subtract 1 from num_rows and num_cols.

We then add the scatter plot to the axis by calling upon ax.scatter() and passing within the X and Y locations from the zechstein_well_intersections dataframe.

The rest of the code involves adding labels to the x and y axis, setting the tick formatting, and setting the sides (spines) of the scatterplot to white.

As we now have 1 well that doesn’t have location information, we will add a small footnote to the scatterplot informing the reader of this fact.

Finally, we’d like so as to add the well names as labels in order that our readers can understand what each marker is. We are able to do that as a part of a for loop and add the labels to a listing.

# Add the scatter plot within the last subplot (subplot 9)
ax = fig.add_subplot(grid[num_rows - 1, num_cols - 1], facecolor='#393d5c')
ax.scatter(zechstein_well_intersections['X_LOC'],
zechstein_well_intersections['Y_LOC'], c=colors, s=60)

ax.grid(alpha=0.5, color='#25253c')
ax.set_axisbelow(True)
ax.set_ylabel('NORTHING', fontsize=12,
fontweight='daring', color='white')
ax.set_xlabel('EASTING', fontsize=12,
fontweight='daring', color='white')

ax.tick_params(axis='each', colours='white')
ax.ticklabel_format(style='plain')
ax.set_title('WELL LOCATIONS', pad=35, fontsize=22, fontweight='daring', color='white')

ax.spines['bottom'].set_color('white')
ax.spines['top'].set_color('white')
ax.spines['right'].set_color('white')
ax.spines['left'].set_color('white')

ax.text(0.0, -0.2, 'Well 16/11-1 ST3 doesn't contain location information', ha='left', va='bottom', fontsize=10,
color='white', transform=ax.transAxes)

labels = []
for i, row in zechstein_well_intersections.iterrows():
labels.append(ax.text(row['X_LOC'], row['Y_LOC'], row['WELL'], color='white', fontsize=14))

After we run our plotting code, we may have the next figure. We are able to now see all eight wells represented as a radial bar chart and their locations represented by a scatter plot.

Matplotlib radial bar charts and a scatter plot all inside a single figure. Image by the creator.

We do have one issue we’d like to resolve, and that’s the positions of the labels. Currently, they’re overlapping the information points, the spines and other labels.

We are able to resolve this through the use of the adjustText library we imported earlier. This library will work out one of the best label position to avoid any of those issues.

To make use of this, all we’d like to do is call upon adjust_text and pass within the labels list we created within the previous for loop. To cut back the quantity of overlap, we will use the expand_points and expand_objects parameters. For this instance, a worth of 1.2 works well.

adjust_text(labels, expand_points=(1.2, 1.2), expand_objects=(1.2, 1.2))
Scatter plot showing well locations and associated labels after using the adjustText library. Image by the creator.

Adding Footnotes and Figure Titles

To complete our infographic, we’d like to provide the reader some extra information.

We’ll add a footnote to the figure to indicate where the information was sourced from and who created it.

To assist the reader understand what the infographic is about, we will add a title using plt.suptitle and a subtitle using fig.text. This may immediately tell the reader what they will expect when taking a look at the charts.

footnote = """
Data Source:
Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Give up, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for
machine learning competition [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4351156

Figure Created By: Andy McDonald
"""

plt.suptitle('LITHOLOGY VARIATION WITHIN THE ZECHSTEIN GP.', size=36, fontweight='daring', color='white')
plot_sub_title = """CHARTS OF LITHOLOGY PERCENTAGES ACROSS 8 WELLS FROM THE NORWEGIAN CONTINENTAL SHELF"""

fig.text(0.5, 0.95, plot_sub_title, ha='center', va='top', fontsize=18, color='white', fontweight='daring')
fig.text(0.1, 0.01, footnote, ha='left', va='bottom', fontsize=14, color='white')

plt.show()

After ending the plotting code, we are going to find yourself with a matplotlib figure just like the one below.

Matplotlib infographic showing lithology variation for the Zechstein Group on the Norwegian Continental Shelf. Image by the creator.

We now have all of the radial bar charts on display and where each of the wells is positioned. This permits the reader to know any spatial variation between the wells, which in turn may help explain variances throughout the data.

For instance, Well 15/9–13 is positioned on the realm’s western side and consists of a mix of dolomite, anhydrite and shale. Whereas well 17/11–1 is positioned on the easter side of the realm and is predominantly composed of halite. This may very well be attributable to different depositional environments across the region.

The complete code for the infographic is displayed below, with each of the principal sections commented.

# Set the variety of columns to your subplot grid
num_cols = 3

# Get the variety of wells (rows within the DataFrame)
num_wells = len(df)

# Calculate the variety of rows needed for the subplot grid
num_rows = np.ceil(num_wells / num_cols).astype(int)

indexes = list(range(0, len(lith_names)))
width = 2*np.pi / len(lith_names)
angles = [element * width for element in indexes]

colors = ["#ae1241", "#5ba8f7", "#c6a000", "#0050ae", "#9b54f3", "#ff7d67", "#dbc227", "#008c5c"]

label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(lith_names))

# Create a figure
fig = plt.figure(figsize=(20, num_rows * 7), linewidth=10,
edgecolor='#393d5c',
facecolor='#25253c')

# Create a grid layout
grid = plt.GridSpec(num_rows, num_cols, wspace=0.5, hspace=0.5)

# Loop over each row within the DataFrame to create the radial bar charts per well
for i, (index, row) in enumerate(df.iterrows()):
ax = fig.add_subplot(grid[i // num_cols, i % num_cols], projection='polar')
bars = ax.bar(x=angles, height=row.values, width=width,
edgecolor='white', zorder=2, alpha=0.8, color=colors[i])

bars_bg = ax.bar(x=angles, height=100, width=width, color='#393d5c',
edgecolor='#25253c', zorder=1)

# Arrange labels, ticks and grid
ax.set_title(index, pad=35, fontsize=22, fontweight='daring', color='white')
ax.set_ylim(0, 100)
ax.set_yticklabels([])
ax.set_xticks([])
ax.grid(color='#25253c')

#Arrange the lithology / category labels to seem at the right angle
for angle, height, lith_name in zip(angles, row.values, lith_names):
rotation_angle = np.degrees(angle)
if angle < np.pi:
rotation_angle -= 90
elif angle == np.pi:
rotation_angle -= 90
else:
rotation_angle += 90
ax.text(angle, 110, lith_name.upper(),
ha='center', va='center',
rotation=rotation_angle, rotation_mode='anchor', fontsize=12,
fontweight='daring', color='white')

# Add the scatter plot within the last subplot (subplot 9)
ax = fig.add_subplot(grid[num_rows - 1, num_cols - 1], facecolor='#393d5c')
ax.scatter(zechstein_well_intersections['X_LOC'], zechstein_well_intersections['Y_LOC'], c=colors, s=60)
ax.grid(alpha=0.5, color='#25253c')
ax.set_axisbelow(True)

# Arrange the labels and ticks for the scatter plot
ax.set_ylabel('NORTHING', fontsize=12,
fontweight='daring', color='white')
ax.set_xlabel('EASTING', fontsize=12,
fontweight='daring', color='white')

ax.tick_params(axis='each', colours='white')
ax.ticklabel_format(style='plain')
ax.set_title('WELL LOCATIONS', pad=35, fontsize=22, fontweight='daring', color='white')

# Set the surface borders of the scatter plot to white
ax.spines['bottom'].set_color('white')
ax.spines['top'].set_color('white')
ax.spines['right'].set_color('white')
ax.spines['left'].set_color('white')

# Add a footnote to the scatter plot explaining missing well
ax.text(0.0, -0.2, 'Well 16/11-1 ST3 doesn't contain location information', ha='left', va='bottom', fontsize=10,
color='white', transform=ax.transAxes)

# Arrange and display well name labels
labels = []
for i, row in zechstein_well_intersections.iterrows():
labels.append(ax.text(row['X_LOC'], row['Y_LOC'], row['WELL'], color='white', fontsize=14))

# Use adjust text to make sure text labels don't overlap with one another or the information points
adjust_text(labels, expand_points=(1.2, 1.2), expand_objects=(1.2, 1.2))

# Create a footnote explaining data source

footnote = """
Data Source:
Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Give up, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for
machine learning competition [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4351156

Figure Created By: Andy McDonald
"""

# Display overall infographic title and footnote
plt.suptitle('LITHOLOGY VARIATION WITHIN THE ZECHSTEIN GP.', size=36, fontweight='daring', color='white')
plot_sub_title = """CHARTS OF LITHOLOGY PERCENTAGES ACROSS 8 WELLS FROM THE NORWEGIAN CONTINENTAL SHELF"""

fig.text(0.5, 0.95, plot_sub_title, ha='center', va='top', fontsize=18, color='white', fontweight='daring')
fig.text(0.1, 0.01, footnote, ha='left', va='bottom', fontsize=14, color='white')

plt.show()

Infographics are an important method to summarise data and present it to readers in a compelling and interesting way without them having to fret in regards to the raw numbers. It’s also an important method to tell stories about your data.

At first, chances are you’ll not think matplotlib is equipped for creating infographics, but with some practice, effort and time, it is unquestionably possible.

Training dataset used as a part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). This dataset is licensed under Creative Commons Attribution 4.0 International.

The complete dataset will be accessed at the next link: https://doi.org/10.5281/zenodo.4351155.

LEAVE A REPLY

Please enter your comment!
Please enter your name here