## Explore the Depths of Buy-till-You-Die (BTYD) Modeling and Practical Coding Techniques

**TL; DR: **The Customer Lifetime Value (CLV) model is a key technique in customer analytics which help corporations discover who priceless customers are. Neglecting CLV can result in overinvestment in short-term customers who may only make a single purchase. ‘Buy Till You Die’ modeling, which utilizes the BG/NBD and Gamma-Gamma models, can estimate CLV. Although one of the best practices vary depending on data size and modeling priorities, PyMC-Marketing is a beneficial Python library for those seeking to quickly implement CLV modeling.

The definition of **CLV is **the whole net revenue an organization can expect from a single customer throughout their relationship. A few of you is likely to be more acquainted with the term ‘LTV’ (Lifetime Value). Yes, CLV and LTV are interchangeable.

- The primary goal is to calculate and predict future CLV, which is able to assist you to learn how much money might be expected from each customer.
- The second objective is to discover profitable customers. The model will inform you who those priceless customers are by analyzing the characteristics of the high CLV customers.
- The third goal is to take marketing actions based on the evaluation and from there, you’ll find a way to optimize your marketing budget allocation accordingly.

Let’s take the e-commerce site of a fashion brand like Nike, for instance, which could use advertisements and coupons to draw recent customers. Now, let’s assume that college students and dealing professionals are two major vital customer segments. For first-time purchases, the corporate spends $10 on promoting for faculty students and $20 for working professionals. And each segments make purchases price around $100.

In the event you were in control of marketing, which segment would you must invest more in? You would possibly naturally think it’s more logical to take a position more in the faculty students segment, considering their lower cost and better ROI.

So, what in the event you knew this information?

The school student segment tends to have a high churn rate, meaning they don’t purchase anymore after that one purchase, leading to $100 being spent on average. Then again, the working professionals segment has a better rate of repeat purchases, leading to a mean of $400 per customer.

In that case, you’ll likely prefer to take a position more within the business professionals segment, because it guarantees a better ROI. This may occasionally look like an easy thing that anyone can understand. Nevertheless, surprisingly, most marketing individuals are focused on achieving the Cost Per Acquisition (CPA), but they should not considering who the profitable customers are in the long term.

By adjusting the “cost per acquisition”, CPA, we are able to attract more high-value customers and improve our ROI. This graph on the left represents the approach without considering CLV. The red line represents CPA.’ , which is the utmost cost we are able to spend to get a brand new customer. Using the identical marketing budget for each customer results in overinvestment in low-value customers and underinvestment in high-value customers.

Now, the graph on the suitable side shows the perfect spending allocation when utilizing CLV. We set a better CPA for high-value customers, and a lower CPA for low-value customers.

It’s much like the hiring process. In the event you aim to rent ex-Goolers, offering a competitive salary is crucial, right? By doing this, we are able to acquire more high-value customers without changing the whole marketing budget.

The CLV model I’m introducing only uses sales transaction data. As you may see, we’ve got three** **data columns: customer_id, transaction date, and transaction value. When it comes to data volume, CLVs typically require two to a few years of transaction data.

**4.1 Approaches for CLV Modeling**

Let’s start by understanding the 2 broad types to calculate CLV: the Historical Approach and the Predictive Approach. Under the Predictive approach, there are two models. The Probabilistic Model and the Machine Learning Models.

**4.2 Traditional CLV Formula**

First, let’s start by considering a standard CLV formula. Here, CLV might be broken down into three components. : Average order value, Purchase Frequency, and Customer lifespan.

Let’s consider a fashion company for instance, on average:

- Customers spend $100 per order
- They shop 4 times per 12 months
- They stay loyal for 3 years

On this case, the CLV is calculated as 100 times 4 times 3, which equals $1,200 per customer. This Formula could be very easy and appears straightforward, right? Nevertheless, there are some limitations.

**4.3 Limitations of Traditional CLV Formula**

**Limitation #1: Not All Customers Are The Same**

This traditional formula assumes that every one customers are homogenous by assigning one average number. When some customers make exceptionally large purchases, the typical doesn’t represent the characteristics of all customers.

**Limitation #2 : Differences in First Purchase Timing**

Let’s say, we use the last 12 months as our data collection period.

This artificial his first purchase a few 12 months ago. On this case, we are able to accurately calculate his purchase frequency per 12 months. It’s 8.

How about two customers? One began purchasing 6 months ago, and the opposite began 3 months ago. Everyone has been buying at the identical pace. Nevertheless, after we have a look at the whole variety of purchases over the past 12 months, they differ. The important thing point here is we’d like to contemplate the tenure of the shopper, meaning the duration since they made their first purchase.

**Limitation #3 : Dead or Alive?**

Determining when a customer is taken into account “churned” is difficult. For subscription services like Netflix, we are able to consider a customer to have churned once they unsubscribe. Nevertheless, within the case of retail or E-commerce, whether a customer is ‘Alive’ or ‘Dead’ is ambiguous.

A customer’s ‘Probability of Being Alive’ is determined by their past purchasing patterns. For instance, if someone who normally buys every month doesn’t make a purchase order in the subsequent three months, they may switch to a distinct brand. Nevertheless, there’s no must worry if a one who typically shops just once every six months doesn’t buy anything in the subsequent three months.

To handle these challenges, we regularly turn to ‘Buy Till You Die’ (BTYD) modeling. This approach comprises two sub-models:

- BG-NBD model :This predicts the likelihood of a customer being energetic and their transaction frequency.

2. Gamma-Gamma model : This estimates the typical order value.

By combining the outcomes from these sub-models, we are able to effectively forecast the Customer Lifetime Value (CLV).

**5.1 BG/NBD model**

We imagine that there are two processes in the shopper’s status: the ‘Purchase Process,’ where customers are actively buying, and the ‘Dropout Process,’ where customers have stopped purchasing.

In the course of the Lively Purchasing Phase, the model forecasts the shopper’s purchase frequency with the “Poisson process”.

There’s all the time a probability that a customer might drop out after each purchase. The BG/NBD model assigns a probability ‘p’ to this possibility.

Consider the image below for illustration. The information indicates this customer made five purchases. Nevertheless, under the idea, the model thinks that if the shopper had remained energetic, they might have made eight purchases in total. But, since the probability of being alive dropped in some unspecified time in the future, we only see five actual purchases.

The acquisition frequency follows a Poisson process while they’re considered ‘energetic’. The Poisson distribution typically represents the count of randomly occurring events. Here, ‘λ’ symbolizes the acquisition frequency for every customer. Nevertheless, the shopper’s purchase frequency can fluctuate. The Poisson distribution accounts for such variability in purchase frequency.

The graph below illustrates how ‘p’ changes over time. Because the time because the last purchase increases (T=31), the probability of a customer being ‘alive’ decreases. When a repurchase occurs (around T=36), you’ll notice that ‘p’ increases once more.

That is the graphical model. As mentioned earlier, it includes lambda (λ) and p. Here, λ and p vary from individual to individual. To account for this diversity, we assume that heterogeneity in λ follows a gamma distribution and Heterogeneity in p follows a “beta distribution. In other words, this model uses a layered approach informed by Bayes’ theorem, which can be called Bayesian hierarchical modeling.

**5.2 Gamma-Gamma model**

We assume that Gamma Distribution models the Average Order Value. The Gamma Distribution is formed by two parameters: the form parameter and the dimensions parameter. As this graph shows, the shape of the Gamma distribution can change quite a bit by changing these two parameters.

This diagram illustrates the graphical model in use. The model employs two Gamma distributions inside a Bayesian hierarchical approach. The primary Gamma distribution represents the “average order value” for every customer. Since this value differs amongst customers, the second Gamma distribution captures the variation in average order value across your complete customer base. The parameters p, q, and γ (gamma) for the prior distributions are determined through the use of Half-flat priors.

**Useful CLV libraries**

Here, let me introduce two great OSS libraries for CLV modeling. The primary one is PyMC-Marketing and the second is CLVTools. Each libraries incorporate Buy till you die modeling. Essentially the most significant difference is that PyMC-Marketing is a Python-based library, while CLVTools is R-based. PyMC-Marketing is built on PyMC, a preferred Bayesian library. Previously, there was a widely known library called ‘Lifetimes’. Nevertheless, ‘Lifetimes’ is now in maintenance mode, so it has transitioned right into a PyMC-Marketing.

**Full code**

The total code might be found on my Github below. My sample code is predicated on yMC-Marketing’s official quick start.

**Code Walkthrough**

First, you will have to import pymc_marketing and other libraries.

`import arviz as az`

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import pymc as pm

from arviz.labels import MapLabellerfrom IPython.display import Image

from pymc_marketing import clv

You have to to download the “Online Retail Dataset” from the “UCI Machine Learning Repository”. This dataset incorporates transactional data from a UK-based online retailer and is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

`import requests`

import zipfile

import os# Download the zip file

url = "https://archive.ics.uci.edu/static/public/352/online+retail.zip"

response = requests.get(url)

filename = "online_retail.zip"

with open(filename, 'wb') as file:

file.write(response.content)

# Unzip the file

with zipfile.ZipFile(filename, 'r') as zip_ref:

zip_ref.extractall("online_retail_data")

# Finding the Excel file name

for file in os.listdir("online_retail_data"):

if file.endswith(".xlsx"):

excel_file = os.path.join("online_retail_data", file)

break

# Convert from Excel to CSV

data_raw = pd.read_excel(excel_file)

data_raw.head()

**Data Cleansing**

A fast data cleansing is required. As an illustration, we’d like to handle return orders, filter out records and not using a customer ID, and create a ‘total sales’ column by multiplying the amount and unit price together.

`# Handling Return Orders`

# Extracting rows where InvoiceNo starts with "C"

cancelled_orders = data_raw[data_raw['InvoiceNo'].astype(str).str.startswith("C")]# Create a brief DataFrame with the columns we wish to match on, and likewise negate the 'Quantity' column

cancelled_orders['Quantity'] = -cancelled_orders['Quantity']

# Merge the unique DataFrame with the temporary DataFrame on the columns we wish to match

merged_data = pd.merge(data_raw, cancelled_orders[['CustomerID', 'StockCode', 'Quantity', 'UnitPrice']],

on=['CustomerID', 'StockCode', 'Quantity', 'UnitPrice'],

how='left', indicator=True)

# Filter out rows where the merge found a match, and likewise filter out the unique return orders

data_raw = merged_data[(merged_data['_merge'] == 'left_only') & (~merged_data['InvoiceNo'].astype(str).str.startswith("C"))]

# Drop the indicator column

data_raw = data_raw.drop(columns=['_merge'])

# Choosing relevant features and calculating total sales

features = ['CustomerID', 'InvoiceNo', 'InvoiceDate', 'Quantity', 'UnitPrice', 'Country']

data = data_raw[features]

data['TotalSales'] = data['Quantity'].multiply(data['UnitPrice'])

# Removing transactions with missing customer IDs as they do not contribute to individual customer behavior

data = data[data['CustomerID'].notna()]

data['CustomerID'] = data['CustomerID'].astype(int).astype(str)

data.head()

Then, we’d like to create a summary table using this ‘clv_summary’ function. The function returns the dataframe in an RFM-T format. RFM-T means Recency, Frequency, Monetary, and Tenure of every customer. These metrics are popular in shopper evaluation.

`data_summary_rfm = clv.utils.clv_summary(data, 'CustomerID', 'InvoiceDate', 'TotalSales')`

data_summary_rfm = data_summary_rfm.rename(columns={'CustomerID': 'customer_id'})

data_summary_rfm.index = data_summary_rfm['customer_id']

data_summary_rfm.head()

**BG/NBD model**

The BG/NBD model is out there as a BetaGeoModel function on this library. If you execute bgm.fit(), your model begins the training.

If you execute bgm.fit_summary(), the system provides a statistical summary of the training process. For instance, this table shows the mean, standard deviation, High-Density Interval, HDI for brief, etc. for the parameters. We can even check r_hat value, which helps assess whether a Markov Chain Monte Carlo (MCMC) simulation has converged. R-hat is taken into account acceptable if it’s 1.1 or less.

`bgm = clv.BetaGeoModel(`

data = data_summary_rfm,

)

bgm.build_model()bgm.fit()

bgm.fit_summary()

The matrix below known as the Probability Alive Matrix. With this, we are able to infer users who’re prone to return and those that are unlikely to return. The X-axis represents the shopper’s historical purchase frequency and the y-axis represents the shoppers’ recency. The colour shows the probability of being alive. Our recent customers are within the bottom-left corner: Low frequency and high recency. Those customers have a high probability of being alive. Our loyal customers are those on the bottom-right: High-frequency and High-recency customers. In the event that they don’t purchase for a very long time, loyal customers change into at-risk customers, which have low probability of being alive.

`clv.plot_probability_alive_matrix(bgm);`

The subsequent thing we are able to do is to predict the longer term transactions for every customer. You should utilize the expected_num_purchases function. Having fit the model, we are able to ask what’s the expected variety of purchases in the subsequent period.

`num_purchases = bgm.expected_num_purchases(`

customer_id=data_summary_rfm["customer_id"],

t=365,

frequency=data_summary_rfm["frequency"],

recency=data_summary_rfm["recency"],

T=data_summary_rfm["T"]

)sdata = data_summary_rfm.copy()

sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values

sdata.sort_values(by="expected_purchases").tail(4)

**Gamma-Gamma model**

Next, we’ll move on to the Gamma-Gamma model to predict the typical order value. We are able to predict the expected “average order value” with ‘Expected_customer_spend’ function.

`nonzero_data = data_summary_rfm.query("frequency>0")`

dataset = pd.DataFrame({

'customer_id': nonzero_data.customer_id,

'mean_transaction_value': nonzero_data["monetary_value"],

'frequency': nonzero_data["frequency"],

})

gg = clv.GammaGammaModel(

data = dataset

)

gg.build_model()

gg.fit();expected_spend = gg.expected_customer_spend(

customer_id=data_summary_rfm["customer_id"],

mean_transaction_value=data_summary_rfm["monetary_value"],

frequency=data_summary_rfm["frequency"],

)

The graph below shows the expected average order value of 5 customers. The typical order value of those two customers is greater than $500, while the typical order value of those three customers is around $350.

`labeller = MapLabeller(var_name_map={"x": "customer"})`

az.plot_forest(expected_spend.isel(customer_id=(range(5))), combined=True, labeller=labeller)

plt.xlabel("Expected average order value");

**Outcomes**

Finally, we are able to mix two sub-models to estimate the CLV of every customer. One thing I would like to say here is the parameter: Discount_rate. This function uses the DCF method, short for “discounted money flow.” When a monthly discount rate is 1%, $100 in a single month is price $99 today.

`clv_estimate = gg.expected_customer_lifetime_value(`

transaction_model=bgm,

customer_id=data_summary_rfm['customer_id'],

mean_transaction_value=data_summary_rfm["monetary_value"],

frequency=data_summary_rfm["frequency"],

recency=data_summary_rfm["recency"],

T=data_summary_rfm["T"],

time=120, # 120 months = 10 years

discount_rate=0.01,

freq="D",

)clv_df = az.summary(clv_estimate, kind="stats").reset_index()

clv_df['customer_id'] = clv_df['index'].str.extract('(d+)')[0]

clv_df = clv_df[['customer_id', 'mean', 'hdi_3%', 'hdi_97%']]

clv_df.rename(columns={'mean' : 'clv_estimate', 'hdi_3%': 'clv_estimate_hdi_3%', 'hdi_97%': 'clv_estimate_hdi_97%'}, inplace=True)

# monetary_values = data_summary_rfm.loc[clv_df['customer_id'], 'monetary_value']

monetary_values = data_summary_rfm.set_index('customer_id').loc[clv_df['customer_id'], 'monetary_value']

clv_df['monetary_value'] = monetary_values.values

clv_df.to_csv('clv_estimates_output.csv', index=False)

Now, I’m going to indicate you ways we are able to improve our marketing actions. The graph below shows an estimated CLV by Country.

`# Calculating total sales per transaction`

data['TotalSales'] = data['Quantity'] * data['UnitPrice']

customer_sales = data.groupby('CustomerID').agg({

'TotalSales': sum,

'Country': 'first' # Assuming a customer is related to just one country

})customer_countries = customer_sales.reset_index()[['CustomerID', 'Country']]

clv_with_country = pd.merge(clv_df, customer_countries, left_on='customer_id', right_on='CustomerID', how='left')

average_clv_by_country = clv_with_country.groupby('Country')['clv_estimate'].mean()

customer_count_by_country = data.groupby('Country')['CustomerID'].nunique()

country_clv_summary = pd.DataFrame({

'AverageCLV': average_clv_by_country,

'CustomerCount': customer_count_by_country,

})

# Calculate the typical lower and upper bounds of the CLV estimates by country

average_clv_lower_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_3%'].mean()

average_clv_upper_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_97%'].mean()

# Add these averages to the country_clv_summary dataframe

country_clv_summary['AverageCLVLower'] = average_clv_lower_by_country

country_clv_summary['AverageCLVUpper'] = average_clv_upper_by_country

# Filtering countries with greater than 20 customers

filtered_countries = country_clv_summary[country_clv_summary['CustomerCount'] >= 20]

# Sorting in descending order by CustomerCount

sorted_countries = filtered_countries.sort_values(by='AverageCLV', ascending=False)

# Prepare the info for error bars

lower_error = sorted_countries['AverageCLV'] - sorted_countries['AverageCLVLower']

upper_error = sorted_countries['AverageCLVUpper'] - sorted_countries['AverageCLV']

asymmetric_error = [lower_error, upper_error]

# Create a brand new figure with a specified size

plt.figure(figsize=(12,8))

# Create a plot representing the typical CLV with error bars indicating the boldness intervals

# We convert the index to an everyday list to avoid issues with matplotlib's handling of pandas Index objects

plt.errorbar(x=sorted_countries['AverageCLV'], y=sorted_countries.index.tolist(),

xerr=asymmetric_error, fmt='o', color='black', ecolor='lightgray', capsize=5, markeredgewidth=2)

# Set labels and title

plt.xlabel('Average CLV') # x-axis label

plt.ylabel('Country') # y-axis label

plt.title('Average Customer Lifetime Value (CLV) by Country with Confidence Intervals') # chart title

# Adjust the y-axis to display countries from top down

plt.gca().invert_yaxis()

# Show the grid lines

plt.grid(True, linestyle='--', alpha=0.7)

# Display the plot

plt.show()

Customers in France are inclined to have a high CLV. Then again, customers in Belgium are inclined to have a lower CLV. From this output, I like to recommend increasing the marketing budget for acquiring customers in France and reducing the marketing budget for acquiring customers in Belgium. After we do the modeling with the U.S.-based data., we might use the states as an alternative of the country.

You is likely to be wondering:

- Can we utilize additional forms of data, akin to access logs?
- Is it possible to include more features like demographic information or marketing activity into the model?

Mainly, the BTYD model only requires transaction data. If you must use other data or other features, an ML approach is likely to be an option. After that, you may assess the performance of each Bayesian and ML models, selecting the one that provides higher accuracy and interpretability.

The flowchart below shows a suggestion for higher CLV modeling.

First, consider your data size. In case your data isn’t large enough otherwise you only have transaction data, BTYD modeling using PyMC Marketing is likely to be one of the best selection. Regardless that your data is large enough, I believe a superb approach is to start out with a BTYD model and if it underperforms, try a distinct approach. Specifically, in case your priority is accuracy over interpretability, neural networks, XGboost, LightGBM, or ensemble techniques may very well be useful. If interpretability remains to be vital to you, consider methods like Random Forest or the explainable AI approach.

In summary, I like to recommend starting with PyMC Marketing is a superb first step in any case!

Listed here are some key takeaways.

- Customer lifetime value (CLV) is the whole net profit an organization can expect from a single customer throughout their relationship.
- We are able to construct a Probabilistic model (BTYD) using the BG/NBD model and the Gamma-Gamma model.
- In the event you are acquainted with Python, PyMC-Marketing is where you may start.

Thanks for reading! If you may have any questions/suggestions, be happy to contact me on Linkedin! Also, I could be joyful in the event you follow me on Towards Data Science.