Home Artificial Intelligence Customer Lifetime Value Prediction with PyMC-Marketing 1. What’s CLV? 2. Business Context 3. Required Data 4. Traditional CLV Formula 5. Buy-Till-You-Die (BTYD) Model 6. Sample Code 7. How are you going to improve the model’s accuracy? 8. Conclusion 9. Reference

Customer Lifetime Value Prediction with PyMC-Marketing 1. What’s CLV? 2. Business Context 3. Required Data 4. Traditional CLV Formula 5. Buy-Till-You-Die (BTYD) Model 6. Sample Code 7. How are you going to improve the model’s accuracy? 8. Conclusion 9. Reference

Customer Lifetime Value Prediction with PyMC-Marketing
1. What’s CLV?
2. Business Context
3. Required Data
4. Traditional CLV Formula
5. Buy-Till-You-Die (BTYD) Model
6. Sample Code
7. How are you going to improve the model’s accuracy?
8. Conclusion
9. Reference

Explore the Depths of Buy-till-You-Die (BTYD) Modeling and Practical Coding Techniques

Towards Data Science
Photo by Boxed Water Is Higher on Unsplash

TL; DR: The Customer Lifetime Value (CLV) model is a key technique in customer analytics which help corporations discover who priceless customers are. Neglecting CLV can result in overinvestment in short-term customers who may only make a single purchase. ‘Buy Till You Die’ modeling, which utilizes the BG/NBD and Gamma-Gamma models, can estimate CLV. Although one of the best practices vary depending on data size and modeling priorities, PyMC-Marketing is a beneficial Python library for those seeking to quickly implement CLV modeling.

The definition of CLV is the whole net revenue an organization can expect from a single customer throughout their relationship. A few of you is likely to be more acquainted with the term ‘LTV’ (Lifetime Value). Yes, CLV and LTV are interchangeable.

Image by Creator
  • The primary goal is to calculate and predict future CLV, which is able to assist you to learn how much money might be expected from each customer.
  • The second objective is to discover profitable customers. The model will inform you who those priceless customers are by analyzing the characteristics of the high CLV customers.
  • The third goal is to take marketing actions based on the evaluation and from there, you’ll find a way to optimize your marketing budget allocation accordingly.
Image by Creator

Let’s take the e-commerce site of a fashion brand like Nike, for instance, which could use advertisements and coupons to draw recent customers. Now, let’s assume that college students and dealing professionals are two major vital customer segments. For first-time purchases, the corporate spends $10 on promoting for faculty students and $20 for working professionals. And each segments make purchases price around $100.

In the event you were in control of marketing, which segment would you must invest more in? You would possibly naturally think it’s more logical to take a position more in the faculty students segment, considering their lower cost and better ROI.

Image by creator, with photos used from Pixabay

So, what in the event you knew this information?

The school student segment tends to have a high churn rate, meaning they don’t purchase anymore after that one purchase, leading to $100 being spent on average. Then again, the working professionals segment has a better rate of repeat purchases, leading to a mean of $400 per customer.

In that case, you’ll likely prefer to take a position more within the business professionals segment, because it guarantees a better ROI. This may occasionally look like an easy thing that anyone can understand. Nevertheless, surprisingly, most marketing individuals are focused on achieving the Cost Per Acquisition (CPA), but they should not considering who the profitable customers are in the long term.

Image by creator, with photos used from Pixabay

By adjusting the “cost per acquisition”, CPA, we are able to attract more high-value customers and improve our ROI. This graph on the left represents the approach without considering CLV. The red line represents CPA.’ , which is the utmost cost we are able to spend to get a brand new customer. Using the identical marketing budget for each customer results in overinvestment in low-value customers and underinvestment in high-value customers.

Now, the graph on the suitable side shows the perfect spending allocation when utilizing CLV. We set a better CPA for high-value customers, and a lower CPA for low-value customers.

Image by creator, with photos used from Pixabay

It’s much like the hiring process. In the event you aim to rent ex-Goolers, offering a competitive salary is crucial, right? By doing this, we are able to acquire more high-value customers without changing the whole marketing budget.

The CLV model I’m introducing only uses sales transaction data. As you may see, we’ve got three data columns: customer_id, transaction date, and transaction value. When it comes to data volume, CLVs typically require two to a few years of transaction data.

Image by Creator

4.1 Approaches for CLV Modeling

Let’s start by understanding the 2 broad types to calculate CLV: the Historical Approach and the Predictive Approach. Under the Predictive approach, there are two models. The Probabilistic Model and the Machine Learning Models.

Image by Creator

4.2 Traditional CLV Formula

First, let’s start by considering a standard CLV formula. Here, CLV might be broken down into three components. : Average order value, Purchase Frequency, and Customer lifespan.

Image by Creator

Let’s consider a fashion company for instance, on average:

  • Customers spend $100 per order
  • They shop 4 times per 12 months
  • They stay loyal for 3 years

On this case, the CLV is calculated as 100 times 4 times 3, which equals $1,200 per customer. This Formula could be very easy and appears straightforward, right? Nevertheless, there are some limitations.

4.3 Limitations of Traditional CLV Formula

Image by Creator

Limitation #1: Not All Customers Are The Same

This traditional formula assumes that every one customers are homogenous by assigning one average number. When some customers make exceptionally large purchases, the typical doesn’t represent the characteristics of all customers.

Limitation #2 : Differences in First Purchase Timing

Let’s say, we use the last 12 months as our data collection period.

Image by creator, with photos used from Pixabay

This artificial his first purchase a few 12 months ago. On this case, we are able to accurately calculate his purchase frequency per 12 months. It’s 8.

How about two customers? One began purchasing 6 months ago, and the opposite began 3 months ago. Everyone has been buying at the identical pace. Nevertheless, after we have a look at the whole variety of purchases over the past 12 months, they differ. The important thing point here is we’d like to contemplate the tenure of the shopper, meaning the duration since they made their first purchase.

Limitation #3 : Dead or Alive?

Determining when a customer is taken into account “churned” is difficult. For subscription services like Netflix, we are able to consider a customer to have churned once they unsubscribe. Nevertheless, within the case of retail or E-commerce, whether a customer is ‘Alive’ or ‘Dead’ is ambiguous.

A customer’s ‘Probability of Being Alive’ is determined by their past purchasing patterns. For instance, if someone who normally buys every month doesn’t make a purchase order in the subsequent three months, they may switch to a distinct brand. Nevertheless, there’s no must worry if a one who typically shops just once every six months doesn’t buy anything in the subsequent three months.

Image by creator, with photos used from Pixabay

To handle these challenges, we regularly turn to ‘Buy Till You Die’ (BTYD) modeling. This approach comprises two sub-models:

  1. BG-NBD model :This predicts the likelihood of a customer being energetic and their transaction frequency.

2. Gamma-Gamma model : This estimates the typical order value.

By combining the outcomes from these sub-models, we are able to effectively forecast the Customer Lifetime Value (CLV).

Image by Creator

5.1 BG/NBD model

We imagine that there are two processes in the shopper’s status: the ‘Purchase Process,’ where customers are actively buying, and the ‘Dropout Process,’ where customers have stopped purchasing.

In the course of the Lively Purchasing Phase, the model forecasts the shopper’s purchase frequency with the “Poisson process”.

There’s all the time a probability that a customer might drop out after each purchase. The BG/NBD model assigns a probability ‘p’ to this possibility.

Consider the image below for illustration. The information indicates this customer made five purchases. Nevertheless, under the idea, the model thinks that if the shopper had remained energetic, they might have made eight purchases in total. But, since the probability of being alive dropped in some unspecified time in the future, we only see five actual purchases.

Image by Creator

The acquisition frequency follows a Poisson process while they’re considered ‘energetic’. The Poisson distribution typically represents the count of randomly occurring events. Here, ‘λ’ symbolizes the acquisition frequency for every customer. Nevertheless, the shopper’s purchase frequency can fluctuate. The Poisson distribution accounts for such variability in purchase frequency.

Image by Creator; Graph sourced from Wikipedia

The graph below illustrates how ‘p’ changes over time. Because the time because the last purchase increases (T=31), the probability of a customer being ‘alive’ decreases. When a repurchase occurs (around T=36), you’ll notice that ‘p’ increases once more.

Image by Creator

That is the graphical model. As mentioned earlier, it includes lambda (λ) and p. Here, λ and p vary from individual to individual. To account for this diversity, we assume that heterogeneity in λ follows a gamma distribution and Heterogeneity in p follows a “beta distribution. In other words, this model uses a layered approach informed by Bayes’ theorem, which can be called Bayesian hierarchical modeling.

Image by Creator

5.2 Gamma-Gamma model

We assume that Gamma Distribution models the Average Order Value. The Gamma Distribution is formed by two parameters: the form parameter and the dimensions parameter. As this graph shows, the shape of the Gamma distribution can change quite a bit by changing these two parameters.

Image by Creator; Graph sourced from Wikipedia

This diagram illustrates the graphical model in use. The model employs two Gamma distributions inside a Bayesian hierarchical approach. The primary Gamma distribution represents the “average order value” for every customer. Since this value differs amongst customers, the second Gamma distribution captures the variation in average order value across your complete customer base. The parameters p, q, and γ (gamma) for the prior distributions are determined through the use of Half-flat priors.

Image by Creator

Useful CLV libraries

Here, let me introduce two great OSS libraries for CLV modeling. The primary one is PyMC-Marketing and the second is CLVTools. Each libraries incorporate Buy till you die modeling. Essentially the most significant difference is that PyMC-Marketing is a Python-based library, while CLVTools is R-based. PyMC-Marketing is built on PyMC, a preferred Bayesian library. Previously, there was a widely known library called ‘Lifetimes’. Nevertheless, ‘Lifetimes’ is now in maintenance mode, so it has transitioned right into a PyMC-Marketing.

Full code

The total code might be found on my Github below. My sample code is predicated on yMC-Marketing’s official quick start.

Code Walkthrough

First, you will have to import pymc_marketing and other libraries.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pymc as pm
from arviz.labels import MapLabeller

from IPython.display import Image
from pymc_marketing import clv

You have to to download the “Online Retail Dataset” from the “UCI Machine Learning Repository”. This dataset incorporates transactional data from a UK-based online retailer and is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

import requests
import zipfile
import os

# Download the zip file
url = "https://archive.ics.uci.edu/static/public/352/online+retail.zip"
response = requests.get(url)
filename = "online_retail.zip"

with open(filename, 'wb') as file:

# Unzip the file
with zipfile.ZipFile(filename, 'r') as zip_ref:

# Finding the Excel file name
for file in os.listdir("online_retail_data"):
if file.endswith(".xlsx"):
excel_file = os.path.join("online_retail_data", file)

# Convert from Excel to CSV
data_raw = pd.read_excel(excel_file)


Data Cleansing

A fast data cleansing is required. As an illustration, we’d like to handle return orders, filter out records and not using a customer ID, and create a ‘total sales’ column by multiplying the amount and unit price together.

# Handling Return Orders
# Extracting rows where InvoiceNo starts with "C"
cancelled_orders = data_raw[data_raw['InvoiceNo'].astype(str).str.startswith("C")]

# Create a brief DataFrame with the columns we wish to match on, and likewise negate the 'Quantity' column
cancelled_orders['Quantity'] = -cancelled_orders['Quantity']

# Merge the unique DataFrame with the temporary DataFrame on the columns we wish to match
merged_data = pd.merge(data_raw, cancelled_orders[['CustomerID', 'StockCode', 'Quantity', 'UnitPrice']],
on=['CustomerID', 'StockCode', 'Quantity', 'UnitPrice'],
how='left', indicator=True)

# Filter out rows where the merge found a match, and likewise filter out the unique return orders
data_raw = merged_data[(merged_data['_merge'] == 'left_only') & (~merged_data['InvoiceNo'].astype(str).str.startswith("C"))]

# Drop the indicator column
data_raw = data_raw.drop(columns=['_merge'])

# Choosing relevant features and calculating total sales
features = ['CustomerID', 'InvoiceNo', 'InvoiceDate', 'Quantity', 'UnitPrice', 'Country']
data = data_raw[features]
data['TotalSales'] = data['Quantity'].multiply(data['UnitPrice'])

# Removing transactions with missing customer IDs as they do not contribute to individual customer behavior
data = data[data['CustomerID'].notna()]
data['CustomerID'] = data['CustomerID'].astype(int).astype(str)

Image by Creator

Then, we’d like to create a summary table using this ‘clv_summary’ function. The function returns the dataframe in an RFM-T format. RFM-T means Recency, Frequency, Monetary, and Tenure of every customer. These metrics are popular in shopper evaluation.

data_summary_rfm = clv.utils.clv_summary(data, 'CustomerID', 'InvoiceDate', 'TotalSales')
data_summary_rfm = data_summary_rfm.rename(columns={'CustomerID': 'customer_id'})
data_summary_rfm.index = data_summary_rfm['customer_id']

BG/NBD model

The BG/NBD model is out there as a BetaGeoModel function on this library. If you execute bgm.fit(), your model begins the training.

If you execute bgm.fit_summary(), the system provides a statistical summary of the training process. For instance, this table shows the mean, standard deviation, High-Density Interval, HDI for brief, etc. for the parameters. We can even check r_hat value, which helps assess whether a Markov Chain Monte Carlo (MCMC) simulation has converged. R-hat is taken into account acceptable if it’s 1.1 or less.

bgm = clv.BetaGeoModel(
data = data_summary_rfm,


The matrix below known as the Probability Alive Matrix. With this, we are able to infer users who’re prone to return and those that are unlikely to return. The X-axis represents the shopper’s historical purchase frequency and the y-axis represents the shoppers’ recency. The colour shows the probability of being alive. Our recent customers are within the bottom-left corner: Low frequency and high recency. Those customers have a high probability of being alive. Our loyal customers are those on the bottom-right: High-frequency and High-recency customers. In the event that they don’t purchase for a very long time, loyal customers change into at-risk customers, which have low probability of being alive.

Image by Creator

The subsequent thing we are able to do is to predict the longer term transactions for every customer. You should utilize the expected_num_purchases function. Having fit the model, we are able to ask what’s the expected variety of purchases in the subsequent period.

num_purchases = bgm.expected_num_purchases(

sdata = data_summary_rfm.copy()
sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values

Gamma-Gamma model

Next, we’ll move on to the Gamma-Gamma model to predict the typical order value. We are able to predict the expected “average order value” with ‘Expected_customer_spend’ function.

nonzero_data = data_summary_rfm.query("frequency>0")
dataset = pd.DataFrame({
'customer_id': nonzero_data.customer_id,
'mean_transaction_value': nonzero_data["monetary_value"],
'frequency': nonzero_data["frequency"],
gg = clv.GammaGammaModel(
data = dataset

expected_spend = gg.expected_customer_spend(

The graph below shows the expected average order value of 5 customers. The typical order value of those two customers is greater than $500, while the typical order value of those three customers is around $350.

labeller = MapLabeller(var_name_map={"x": "customer"})
az.plot_forest(expected_spend.isel(customer_id=(range(5))), combined=True, labeller=labeller)
plt.xlabel("Expected average order value");
Image by Creator


Finally, we are able to mix two sub-models to estimate the CLV of every customer. One thing I would like to say here is the parameter: Discount_rate. This function uses the DCF method, short for “discounted money flow.” When a monthly discount rate is 1%, $100 in a single month is price $99 today.

clv_estimate = gg.expected_customer_lifetime_value(
time=120, # 120 months = 10 years

clv_df = az.summary(clv_estimate, kind="stats").reset_index()

clv_df['customer_id'] = clv_df['index'].str.extract('(d+)')[0]

clv_df = clv_df[['customer_id', 'mean', 'hdi_3%', 'hdi_97%']]
clv_df.rename(columns={'mean' : 'clv_estimate', 'hdi_3%': 'clv_estimate_hdi_3%', 'hdi_97%': 'clv_estimate_hdi_97%'}, inplace=True)

# monetary_values = data_summary_rfm.loc[clv_df['customer_id'], 'monetary_value']
monetary_values = data_summary_rfm.set_index('customer_id').loc[clv_df['customer_id'], 'monetary_value']
clv_df['monetary_value'] = monetary_values.values
clv_df.to_csv('clv_estimates_output.csv', index=False)

Now, I’m going to indicate you ways we are able to improve our marketing actions. The graph below shows an estimated CLV by Country.

# Calculating total sales per transaction
data['TotalSales'] = data['Quantity'] * data['UnitPrice']
customer_sales = data.groupby('CustomerID').agg({
'TotalSales': sum,
'Country': 'first' # Assuming a customer is related to just one country

customer_countries = customer_sales.reset_index()[['CustomerID', 'Country']]

clv_with_country = pd.merge(clv_df, customer_countries, left_on='customer_id', right_on='CustomerID', how='left')

average_clv_by_country = clv_with_country.groupby('Country')['clv_estimate'].mean()

customer_count_by_country = data.groupby('Country')['CustomerID'].nunique()

country_clv_summary = pd.DataFrame({
'AverageCLV': average_clv_by_country,
'CustomerCount': customer_count_by_country,
# Calculate the typical lower and upper bounds of the CLV estimates by country
average_clv_lower_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_3%'].mean()
average_clv_upper_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_97%'].mean()

# Add these averages to the country_clv_summary dataframe
country_clv_summary['AverageCLVLower'] = average_clv_lower_by_country
country_clv_summary['AverageCLVUpper'] = average_clv_upper_by_country

# Filtering countries with greater than 20 customers
filtered_countries = country_clv_summary[country_clv_summary['CustomerCount'] >= 20]

# Sorting in descending order by CustomerCount
sorted_countries = filtered_countries.sort_values(by='AverageCLV', ascending=False)

# Prepare the info for error bars
lower_error = sorted_countries['AverageCLV'] - sorted_countries['AverageCLVLower']
upper_error = sorted_countries['AverageCLVUpper'] - sorted_countries['AverageCLV']
asymmetric_error = [lower_error, upper_error]

# Create a brand new figure with a specified size

# Create a plot representing the typical CLV with error bars indicating the boldness intervals
# We convert the index to an everyday list to avoid issues with matplotlib's handling of pandas Index objects
plt.errorbar(x=sorted_countries['AverageCLV'], y=sorted_countries.index.tolist(),
xerr=asymmetric_error, fmt='o', color='black', ecolor='lightgray', capsize=5, markeredgewidth=2)

# Set labels and title
plt.xlabel('Average CLV') # x-axis label
plt.ylabel('Country') # y-axis label
plt.title('Average Customer Lifetime Value (CLV) by Country with Confidence Intervals') # chart title

# Adjust the y-axis to display countries from top down

# Show the grid lines
plt.grid(True, linestyle='--', alpha=0.7)

# Display the plot

Image by Creator

Customers in France are inclined to have a high CLV. Then again, customers in Belgium are inclined to have a lower CLV. From this output, I like to recommend increasing the marketing budget for acquiring customers in France and reducing the marketing budget for acquiring customers in Belgium. After we do the modeling with the U.S.-based data., we might use the states as an alternative of the country.

You is likely to be wondering:

  • Can we utilize additional forms of data, akin to access logs?
  • Is it possible to include more features like demographic information or marketing activity into the model?

Mainly, the BTYD model only requires transaction data. If you must use other data or other features, an ML approach is likely to be an option. After that, you may assess the performance of each Bayesian and ML models, selecting the one that provides higher accuracy and interpretability.

The flowchart below shows a suggestion for higher CLV modeling.

Image by Creator

First, consider your data size. In case your data isn’t large enough otherwise you only have transaction data, BTYD modeling using PyMC Marketing is likely to be one of the best selection. Regardless that your data is large enough, I believe a superb approach is to start out with a BTYD model and if it underperforms, try a distinct approach. Specifically, in case your priority is accuracy over interpretability, neural networks, XGboost, LightGBM, or ensemble techniques may very well be useful. If interpretability remains to be vital to you, consider methods like Random Forest or the explainable AI approach.

In summary, I like to recommend starting with PyMC Marketing is a superb first step in any case!

Listed here are some key takeaways.

  • Customer lifetime value (CLV) is the whole net profit an organization can expect from a single customer throughout their relationship.
  • We are able to construct a Probabilistic model (BTYD) using the BG/NBD model and the Gamma-Gamma model.
  • In the event you are acquainted with Python, PyMC-Marketing is where you may start.

Thanks for reading! If you may have any questions/suggestions, be happy to contact me on Linkedin! Also, I could be joyful in the event you follow me on Towards Data Science.


Please enter your comment!
Please enter your name here