Home Artificial Intelligence Understanding Bayesian Marketing Mix Modeling: A Deep Dive into Prior Specifications Data LightweightMMM Model Specification Conclusion

Understanding Bayesian Marketing Mix Modeling: A Deep Dive into Prior Specifications Data LightweightMMM Model Specification Conclusion

0
Understanding Bayesian Marketing Mix Modeling: A Deep Dive into Prior Specifications
Data
LightweightMMM Model Specification
Conclusion

Exploring model specification with Google’s LightweightMMM

Towards Data Science
Photo by Pawel Czerwinski on Unsplash

Bayesian marketing mix modeling has been receiving increasingly attention, especially with the recent releases of open source tools like LightweightMMM (Google) or PyMC Marketing (PyMC Labs). Although these frameworks simplify the complexities of Bayesian modeling, it continues to be crucial for the user to have an understanding of fundamental Bayesian concepts and have the ability to grasp the model specification.

In this text, I take Google’s LightweightMMM as a practical example and show the intuition and meaning of the prior specifications of this framework. I exhibit the simulation of prior samples using Python and the scipy library.

I take advantage of the info made available by Robyn under MIT Licence.

The dataset consists of 208 weeks of revenue (from 2015–11–23 to 2019–11–11) having:

  • 5 media spend channels: tv_S, ooh_S, print_S, facebook_S, search_S
  • 2 media channels which have also the exposure information (Impression, Clicks): facebook_I, search_clicks_P
  • Organic media without spend: newsletter
  • Control variables: events, holidays, competitor sales (competitor_sales_B)

The specification of the LightweightMMM model is defined as follows:

LMMM Model Specification (image by the creator)

This specification represents an additive linear regression model that explains the worth of a response (goal variable) at a particular time point t.

Let’s break down each component within the equation:

  • α: This component represents the intercept or the baseline value of the response. It’s the expected value of the response when all other aspects are zero.
  • trend: This component captures the increasing or decreasing trend of the response over time.
  • seasonality: This component represents periodic fluctuations within the response.
  • media_channels: This component accounts for the influence of media channels (television, radio, online ads) on the response.
  • other_factors: This component encompasses another variables which have influence on the response reminiscent of weather, economic indicators or competitor activities.

Below, I am going through each of the components intimately and explain methods to interpret the prior specifications. As a reminder, a previous distribution is an assumed distribution of some parameter with none knowledge of the underlying data.

Intercept

Intercept prior specification (image by the creator)

The intercept is defined to follow a half-normal distribution with a typical deviation of two. A half-normal distribution is a continuous probability distribution that resembles a standard distribution but is restricted to positive values only. The distribution is characterised by a single parameter, the usual deviation (scale). Half-normal distribution implies that the intercept can get only positive values.

The next code generates samples from the prior distribution of the intercept and visualizes the probability density function (PDF) for a half-normal distribution with a scale of two. For visualizations of other components, please discuss with the accompanying source code within the Github repo.

from scipy import stats

scale = 2
halfnormal_dist = stats.halfnorm(scale=scale)
samples = halfnormal_dist.rvs(size=1000)

plt.figure(figsize=(20, 6))
sns.histplot(samples, bins=50, kde=False, stat='density', alpha=0.5)
sns.lineplot(x=np.linspace(0, 6, 100),
y=halfnormal_dist.pdf(np.linspace(0, 6, 100)), color='r')

plt.title(f"Half-Normal Distribution with scale={scale}")
plt.xlabel('x')
plt.ylabel('P(X=x)')
plt.show()

Half Normal Distribution (image by the creator)

Trend

Trend specification (image by the creator)

The trend is defined as a power-law relationship between time t and the trend value. The parameter μ represents the amplitude or magnitude of the trend, while k controls the steepness or curvature of the trend.

The parameter μ is drawn from a standard distribution with a mean of 0 and a typical deviation of 1. This suggests that μ follows a typical normal distribution, centered around 0, with standard deviation of 1. The traditional distribution allows for positive and negative values of μ, representing upward or downward trends, respectively.

The parameter k is drawn from a uniform distribution between 0.5 and 1.5. The uniform distribution ensures that k takes values that end in an affordable and meaningful curvature for the trend.

The plot below depicts separate components obtained from the prior distributions: a sample of the intercept and trend, each represented individually.

Trend and Intercept (image by the creator)

Seasonality

Seasonality specification (image by the creator)

Each component γ is drawn from a standard distribution with a mean of 0 and a typical deviation of 1.

By combining the cosine and sine functions with different γ, cyclic patterns can modeled to capture the seasonality present in the info. The cosine and sine functions represent the oscillating behavior observed over the period of 52 units (weeks).

The plot below illustrates a sample of the seasonality, intercept and trend obtained from the prior distributions.

Seasonality, Trend and Intercept (image by the creator)

Other aspects (control variables)

Other Aspects specification (image by the creator)

Each factor coefficient λ​ is drawn from a standard distribution with a mean of 0 and a typical deviation of 1, which suggests that λ can take positive or negative values​, representing the direction and magnitude of the influence each factor has on the end result.

The plot below depicts separate components obtained from the prior distributions: a sample of the intercept, trend, seasonality and control variables (competitor_sales_B, newsletter, holidays and events) each represented individually.

Other aspects (combined) (image by the creator)

Media Channels

Media Channels prior specification (image by the creator)

The distribution for β​ coefficient of a media channel m is specified as a half-normal distribution, where the usual deviation parameter v​ is decided by the sum of the entire cost related to media channel m. The whole cost reflects the investment or resources allocated to that individual media channel.

Media Transformations

Adstock and Hill Saturation Specification (image by the creator)

In these equations, we’re modeling the media channels’ behavior using a series of transformations, reminiscent of adstock and Hill saturation.

The variable media channels represents the transformed media channels at time point t. It’s obtained by applying a change to the raw media channel value x​. The Hill transformation is controlled by the parameters K​ a half saturation point (0 < k ≤ 1), and shape S​ controlling the steepness of the curve (s > 0).

The variable x∗​ represents the transformed media channels value at time t after undergoing the adstock transformation. It’s calculated by adding the present raw media channel value to the product of the previous transformed value and the adstock decay parameter λ​.

Parameters K​ and S​ follow gamma distributions with shape and scale parameters each set to 1, while λ​ follows a beta distribution with shape parameters 2 and 1.

The probability density function of the Hill Saturation parameters K​ and S are illustrated within the plot below:

shape = 1
scale = 1

gamma_dist = stats.gamma(a=shape, scale=scale)
samples = gamma_dist.rvs(size=1000)

plt.figure(figsize=(20, 6))
sns.histplot(samples, bins=50, kde=False, stat='density', alpha=0.5)
sns.lineplot(x=np.linspace(0, 6, 100), y=gamma_dist.pdf(np.linspace(0, 6, 100)), color='r')

plt.title(f"Gamma Distribution for $K_m$ and $S_m$ with shape={shape} and scale={scale}")
plt.xlabel('x')
plt.ylabel('P(X=x)')

# Show the plot
plt.show()python

Gamma distribution (image by the creator)

The probability density function of the adstock parameter λ is shown within the plot below:

Beta distribution (image by the creator)

A Note on the specification of the adstock parameter λ:

The probability density function of the Beta(α = 2, β = 1) distribution exhibits a positive trend, indicating that higher values have the next probability density. In media evaluation, different industries and media activities may exhibit various decay rates, with most media channels typically exhibiting small decay rates. As an example, Robyn suggests the next ranges of λ decay for common media channels: TV (0.3–0.8), OOH/Print/Radio (0.1–0.4), and digital (0–0.3).

Within the context of the Beta(α = 2, β = 1) distribution, higher probabilities are assigned to λ values closer to 1, while lower probabilities are assigned to values closer to 0. Consequently, outcomes or observations near the upper end of the interval [0, 1] usually tend to occur in comparison with outcomes near the lower end.

Alternatively, within the Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects, the decay parameter is defined as Beta(α = 3, β = 3), whose probability density function is illustrated below. This distribution is symmetric around 0.5, indicating an equal likelihood of observing outcomes at each extremes and near the middle of the interval [0, 1].

Beta(3,3) (image by the creator)

The plot below depicts separate components obtained from the prior distributions: a sample of the intercept, trend, seasonality, control variables and media channels, each represented individually.

All model components (image by the creator)

Combining all components

As mentioned earlier, LightweightMMM models an additive linear regression by combining various components reminiscent of intercept, trend, seasonality, media channels, and other aspects sampled from their prior distributions to acquire the predictive response. The plot below visualizes the true response and the expected response sampled from the prior predictive distribution.

Visualizing a single sample against the true response value allows us to look at how the model’s prediction compares to the actual end result for a particular set of parameter values. It could actually provide an intuitive understanding of how the model performs in that individual instance.

Revenue: True vs. Prior (image by the creator)

Prior predictive check

So as get more robust insights, it is usually advisable to sample multiple times from the prior predictive distribution and measure the uncertainty. The prior predictive check helps assess the adequacy of the chosen model and evaluate whether the model’s predictions align with our expectations, before observing any actual data.

The plot depicted below visualizes the prior predictive distribution by showing the expected revenue (mean) at each point, together with measures of uncertainty. We are able to see that the true revenue falls inside the range of the usual deviation, indicating that the model specification is suitable for the observed data.

Prior predictive check (image by the creator)

Bayesian marketing mix modeling may take considerable time to master. I hope that this text helped you to reinforce your understanding of prior distributions and Bayesian marketing model specifications.

The whole code may be downloaded from my Github repo

Thanks for reading!

LEAVE A REPLY

Please enter your comment!
Please enter your name here