Home Artificial Intelligence SynthDiD 101: A Beginner’s Guide to Synthetic Difference-in-Differences Thanks for reading!

SynthDiD 101: A Beginner’s Guide to Synthetic Difference-in-Differences Thanks for reading!

0
SynthDiD 101: A Beginner’s Guide to Synthetic Difference-in-Differences
Thanks for reading!

Title image generated by creator using Nightcafe

On this blog post, I give a fast introduction to the Synthetic Difference-in-Differences (SynthDiD) method and discuss its relation to the standard Difference-in-Differences (DiD) and Synthetic Control Method (SCM). SynthDiD is a generalized version of SCM and DiD that mixes the strengths of each methods. It enables causal inference with large panels, even with a brief pretreatment period. I discuss benefits and drawbacks of this method while demonstrating the approach using the synthdid package in R. I provide bullet points for a fast introduction.

Synthetic Control Method vs. Synthetic Difference-in-Differences

The synthetic control method and the synthetic difference-in-differences method are closely related, but differ in how they estimate causal effects. The synthetic control method is a statistical technique that creates a “synthetic” control group by combining multiple control units which can be much like the treatment unit in all relevant characteristics. The synthetic control group is constructed to match the pre-treatment outcomes of the treated unit as closely as possible. The treatment effect is then estimated by comparing the post-treatment outcomes of the treated unit to those of the synthetic control group.

However, synthetic DiD combines the synthetic control method with the difference-in-differences approach [1]. On this method, an artificial control group is constructed using the identical approach as within the synthetic control method. Nonetheless, the treatment effect is estimated by comparing the change in outcomes between the treated unit and the synthetic control group before and after the treatment is introduced. This approach allows for a more robust estimation of the treatment effect by accounting for pre-existing differences between the treatment and control groups.

In summary, while each methods use an artificial control group, the synthetic control method estimates treatment effects by comparing the post-treatment outcomes of the treated unit to those of the synthetic control group, while synthetic DiD estimates treatment effects by comparing the change in outcomes between the treated unit and the synthetic control group before and after the treatment is introduced.

Synthetic DiD in bulletpoints:

  • SynthDiD is a generalized version of SCM and DiD.
  • It borrows strengths from the DiD method in addition to the synthetic control method [2][3].
  • It constructs a counterfactual for the treated group by optimally weighting the control group units to attenuate the difference between the treated and control groups within the pretreatment period as in SCM.
  • Then, the treatment effect is estimated by comparing the consequence changes within the treated unit and artificial control group pre- and post-intervention as in DiD.
  • SynthDiD accounts for unit-level changes in consequence as in DiD [4].
  • It facilitates inference in extensive panels, even when the pretreatment phase is temporary, which sets it aside from the synthetic control method (SCM necessitates a lengthy pretreatment period).
  • Same as in SCM, the units change into the “variables” and we represent the consequence as a weighted average of the units (i.e., synthetic control).

Example

Suppose that we’re an organization that sells plant-based food products, equivalent to soy milk or soy yogurt, and we operate in multiple countries. Some countries implement recent laws that prohibits us from marketing our plant-based products as ‘milk’ or ‘yogurt’ since it is claimed that only animal products may be marketed as ‘milk’ or ‘yogurt’ (due to one among my former students for the inspiration for this instance :). Thus, as a consequence of this recent regulation in some countries, we have now to market soy milk as soy drink as a substitute of soy milk, etc. We would like to know the impact of this laws on our revenue as this might help guide our lobbying efforts and marketing activities in several countries.

I simulated a balanced panel dataset that shows the revenue of our company in 30 different countries for 30 periods. Three of the countries implement this laws in period 20. Within the figure below, you may see a snapshot of the information. treat is a dummy variable indicating whether a rustic has implemented the laws in a given period. revenueis the revenue in thousands and thousands of EUR. You could find the simulation and estimation code on this Gist.

# Install and cargo the required packages
# devtools::install_github("synth-inference/synthdid")
library(synthdid)
library(ggplot2)
library(data.table)

# Set seed for reproducibility
set.seed(12345)

source('sim_data.R') # Import simulation function and a few utilities

dt <- sim_data()
head(dt)

Snapshot of the information, image by creator.

Next, we convert our panel data right into a matrix required by the synthdid package. Given the consequence, treatment and control units and pretreatment periods, an artificial control is created and treatment effect is estimated with synthdid_estimate function. To make inference, we also must calculate the usual errors. I take advantage of jacknife method as I actually have multiple treated units. placebo method is the one option if you might have one treatment unit. Given the usual errors, I also calculate the 95% confidence interval for the treatment effect. I’ll report these within the figure below.

# Convert the information right into a matrix
setup = panel.matrices(dt, unit = 'country', time = 'period',
consequence = 'revenue', treatment = 'treat')

# Estimate treatment effect using SynthDiD
tau.hat = synthdid_estimate(setup$Y, setup$N0, setup$T0)

# Calculate standard errors
se = sqrt(vcov(tau.hat, method='jackknife'))
te_est <- sprintf('Point estimate for the treatment effect: %1.2f', tau.hat)
CI <- sprintf('95%% CI (%1.2f, %1.2f)', tau.hat - 1.96 * se, tau.hat + 1.96 * se)

Let’s also plot the outcomes with some more information on the information.

# Check the variety of treatment and control countries to report
num_treated <- length(unique(dt[treat==1]$country))
num_control <- length(unique(dt$country))-num_treated

# Create spaghetti plot with top 10 control units
top.controls = synthdid_controls(tau.hat)[1:10, , drop=FALSE]
plot(tau.hat, spaghetti.units=rownames(top.controls),
trajectory.linetype = 1, line.width=.75,
trajectory.alpha=.9, effect.alpha=.9,
diagram.alpha=1, onset.alpha=.9, ci.alpha = .3, spaghetti.line.alpha =.2,
spaghetti.label.alpha = .1, overlay = 1) +
labs(x = 'Period', y = 'Revenue', title = 'Estimation Results',
subtitle = paste0(te_est, ', ', CI, '.'),
caption = paste0('The variety of treatment and control units: ', num_treated, ' and ', num_control, '.'))

Within the image below, the estimation results are displayed. Observe how the treated countries and the synthetic control exhibit fairly parallel trends on average (it won’t seem like an ideal parallel trends but that just isn’t needed for the sake of this instance). The typical for treated countries is more variable, primarily as a consequence of the presence of only three such countries, leading to less smooth trends. Transparent gray lines represent different control countries. Following the treatment in period 20, a decline in revenue is observed within the treated countries, estimated to be 0.51 million EUR as indicated within the graph. Which means that the brand new regulation has a negative impact on our company’s revenues and needed actions must be taken to stop further declines.

Results, image by creator.

Let’s plot the weights use to estimate the synthetic control.

# Plot control unit contributions
synthdid_units_plot(tau.hat, se.method='jackknife') +
labs(x = 'Country', y = 'Treatment effect',
caption = 'The black horizontal line shows the actual effect;
the grey ones show the endpoints of a 95% confidence interval.')

Within the image below, you may observe how each country is weighted to construct the synthetic control. The treatment effects differ based on the untreated country chosen because the control unit.

Country weights, image by creator.

Now that we understand more about SynthDiD let’s speak about pros and cons of this method. There are some benefits and drawbacks to SynthDiD like every method. Listed below are some pros and cons to take note when getting began with this method.

Benefits of SynthDiD method:

  • The synthetic control method is normally used for just a few treated and control units and desires long, balanced data before treatment. SynthDiD, then again, works well even with a brief data period before treatment, unlike the synthetic control method [4].
  • This method is being preferred especially since it doesn’t have a strict parallel trends assumption (PTA) requirement like DiD.
  • SynthDiD guarantees an appropriate quantity of control units, considers possible pre-intervention patterns, and should accommodate a level of endogenous treatment timing [4].

Disadvantages of SynthDiD method:

  • Could be computationally expensive (even with just one treated group/block).
  • Requires a balanced panel (i.e., you may only use units observed all the time periods) and that the treatment timing is equivalent for all treated units.
  • Requires enough pre-treatment periods for good estimation, so, in case you don’t have enough pre-treatment period is likely to be higher to make use of just the regular DiD.
  • Computing and comparing the typical treatment effects for subgroups is difficult. One option is to separate the sample into subgroups and compute the typical treatment effects for every subgroup.
  • Implementing SynthDiD where the treatment timing varies is likely to be tricky. Within the case of staggered treatment timing, as one solution, one can estimate the typical treatment effect for every treatment cohort after which aggregate cohort-specific average treatment effects to an overall average treatment effects.

Listed below are also another points that you just might need to know when getting began.

Things to notice:

  • SynthDiD employs regularized ridge regression (L2) while ensuring that the resulting weights have a sum of 1.
  • Within the technique of pretreatment matching, SynthDiD tries to find out the typical treatment effect across all the sample. This approach might cause individual time period estimates to be less precise. Nonetheless, the general average yields an unbiased evaluation.
  • The usual errors for the treatment effects are estimated with jacknife or if a cohort has just one treated unit with placebo method.
  • The estimator is taken into account consistent and asymptotically normal, provided that the mix of the variety of control units and pretreatment periods is sufficiently large relative to the mix of the variety of treated units and posttreatment periods.
  • In practice, pre-treatment variables play a minor role in Synthetic DiD, as lagged outcomes hold more predictive power, making the treatment of those variables less critical.

Conclusion

On this blog post, I introduce the SynthDiD method and discuss its relationship with traditional DiD and SCM. SynthDiD combines the strengths of each SCM and DiD, allowing for causal inference with large panels even when the pretreatment period is brief. I reveal the strategy using the synthdid package in R. Even though it has several benefits, equivalent to not requiring a strict parallel trends assumption, it also has drawbacks, like being computationally expensive and requiring a balanced panel. Overall, SynthDiD is a worthwhile tool for researchers concerned about estimating causal effects using observational data, providing a substitute for traditional DiD and SCM methods.

LEAVE A REPLY

Please enter your comment!
Please enter your name here