Propensity Score Matching Methods for causal inference

A Comprehensive Tutorial to Propensity Score Matching in R

Author

Zahid Asghar

Published

June 14, 2024

Introduction

Causal inference is a critical aspect of research in fields such as economics, public health, and social sciences. One of the primary challenges in causal inference is estimating the effect of a treatment or intervention on an outcome variable while accounting for potential confounding variables. Conducting Randomized Controlled Trials (RCTs) is considered the gold standard for estimating causal effects. However, in many cases, researchers rely on observational data, where treatment assignment is not random, leading to potential bias in the estimated treatment effect. Propensity score matching is a widely used method to address this issue by balancing covariates between treatment and control groups, thereby reducing bias and improving the validity of causal inference. Propensity score matching (PSM) is a statistical technique designed to address this challenge by accounting for covariates that predict receiving the treatment, making it particularly useful in observational studies where random assignment is not feasible. In this tutorial, I will demonstrate how to perform propensity score matching using various methods in R, highlight the advantages of entropic balancing, and provide a step-by-step guide to assessing balance diagnostics and estimating treatment effects with matched data. This comprehensive guide aims to equip researchers with the necessary tools to conduct robust causal inference using observational data.

I will demonstrate how to perform propensity score matching in R using the MatchIt package and others. Additionally, we will highlight entropic balancing, a useful tool that matches data without losing sample observations. The goal is to minimize the entropy of the covariates in the treated and control groups, thereby creating balance between the groups.

I will also demonstrate how to use the ebal package to perform entropy balancing in R, assess balance diagnostics, and estimate the treatment effect using the matched data. Various matching methods, including nearest neighbor matching, kernel matching, radius matching, propensity score matching, caliper matching, exact matching and entropy balancing will be covered. Additionally, I will demonstrate how to use the cobalt package to assess balance diagnostics and the WeightIt package to perform weighting to estimate the treatment effect.

Load packages

First, we need to load the required R packages for this tutorial. If you haven’t installed these packages yet, you can do so by running the following code:

Code
# Install pacman if not already installed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")

# Use pacman to install and load the required packages
pacman::p_load(
  MatchIt,
  cobalt,
  WeightIt,
  survey,
  haven,
  lmtest,
  tidyverse,
  patchwork,
  Matching,
  ebal
)

Load data

We will work with the sample dataset lalonde from the MatchIt package, which contains a treatment group from the National Supported Work Demonstration (NSW) program from the mid-1970s designed to help disadvantaged individuals find employment and improve their economic situation. The data includes information on individuals who participated in the program (treat = 1) and those who did not (treat = 0). We will use propensity score matching to estimate the effect of the program on the earnings of the participants.

The key question we want to answer is:

Did participation in the NSW demonstration program result in higher incomes?

Code
data(lalonde, package = "MatchIt") 
lalonde$age2 <- lalonde$age^2 # Create a new variable age2
lalonde |> slice(0) |> glimpse()
Rows: 0
Columns: 10
$ treat    <int> 
$ age      <int> 
$ educ     <int> 
$ race     <fct> 
$ married  <int> 
$ nodegree <int> 
$ re74     <dbl> 
$ re75     <dbl> 
$ re78     <dbl> 
$ age2     <dbl> 

This data set contains the following variables:

  • treat: A binary variable indicating whether the individual participated in the program (1) or not (0).
  • age: The age of the individual(in years)
  • educ: The number of years of education.
  • race: A factor variable indicating black, hispanic or white.
  • married: A binary variable indicating whether the individual is married (1) or not (0).
  • nodegree: A binary variable indicating whether the individual has a degree (1) or not (0).
  • re74: Earnings in 1974.
  • re75: Earnings in 1975.
  • re78: Earnings in 1978.

We have total observations 614 and 10 variables in the dataset. After insepection of data and understanding the variables, we can proceed further for analyzing impact of treatment on earnings in 1978.

t-Test

We can start by performing a simple t-test to compare the earnings in 1978 between the treatment and control groups. There are 185 individuals in the treatment group and 429 individuals in the control group.

Code
t.test(re78 ~ treat, data = lalonde) 

    Welch Two Sample t-test

data:  re78 by treat
t = 0.93773, df = 326.41, p-value = 0.3491
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -697.192 1967.244
sample estimates:
mean in group 0 mean in group 1 
       6984.170        6349.144 

Results indicate that 6349.14 is the average earnings in 1978 for the treatment group and 6984.17 is the average earnings in 1978 for the control group. Difference in means is -635.03 is a bad sign that NSW program is not effective. But this data set is built from two different data sets NSW and PSID and hence we need to perform propensity score matching to get the correct estimate of treatment effect.

Matching methods

There are various methods to perform propensity score matching. We will start with exact matching and then move to propensity score matching using different methods.

Exact Matching

Exact matching is a method of matching treatment and control units based on exact values of covariates. In this method, treatment and control units are matched if they have the same values on the covariates. This method is useful when there are a small number of covariates and the sample size is large enough to allow for exact matching.

Code
# Perform exact matching
exact_match <- matchit(treat ~ age +age2 + educ + race + married + nodegree,
                       exact = ~ age +age2 + educ + race + married + nodegree,data = lalonde)
Warning: Fewer control units than treated units in some `exact` strata; not all
treated units will get a match.
Code
exact_match
A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression
 - number of obs.: 614 (original), 90 (matched)
 - target estimand: ATT
 - covariates: age, age2, educ, race, married, nodegree

Summary of exact_match is given as follows:

Code
# Summary of the matching
summary(exact_match)

Call:
matchit(formula = treat ~ age + age2 + educ + race + married + 
    nodegree, data = lalonde, exact = ~age + age2 + educ + race + 
    married + nodegree)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6204        0.1637          1.7929     1.3755    0.3992
age              25.8162       28.0303         -0.3094     0.4400    0.0813
age2            717.3946      901.7786         -0.4276     0.3627    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
           eCDF Max
distance     0.6752
age          0.1577
age2         0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.4837        0.4837               0          1         0
age              20.2444       20.2444               0          1         0
age2            423.5778      423.5778               0          1         0
educ             10.4000       10.4000               0          1         0
raceblack         0.7333        0.7333               0          .         0
racehispan        0.0444        0.0444               0          .         0
racewhite         0.2222        0.2222               0          .         0
married           0.0444        0.0444               0          .         0
nodegree          0.7111        0.7111               0          .         0
           eCDF Max Std. Pair Dist.
distance          0               0
age               0               0
age2              0               0
educ              0               0
raceblack         0               0
racehispan        0               0
racewhite         0               0
married           0               0
nodegree          0               0

Sample Sizes:
          Control Treated
All           429     185
Matched        45      45
Unmatched     384     140
Discarded       0       0

From this summary we can see that there are matched pairs in the data. We can also see the distribution of the covariates in the matched data. As matched data contain only 45 observations from each treatment and control group, so we lose lot of information and hence we can’t rely on this method. Therefore, we will use other matching methods to estimate the treatment effect, called propensity score matching.

Exact matching diagnostics

love.plot function from cobalt package is used to visualize the diagnostics of the matching. The function plots the standardized mean differences before and after matching for each covariate. The goal is to achieve balance in the covariates between the treatment and control groups after matching.

Code
love.plot(exact_match, drop.distance = FALSE) 
Warning: Standardized mean differences and raw mean differences are present in the same plot. 
Use the `stars` argument to distinguish between them and appropriately label the x-axis.
Figure 1

Red dots represent the standardized mean differences before matching, and blue dots represent the standardized mean differences after matching. Figure 1 indicates that all adjusted difference are exactly equal to zero. This is because we have performed exact matching, which means that the treatment and control units are matched exactly on the covariates.

Propensity Score Matching

Is a white race person with 12 years of education, 39 years of age and married not same as another person with 12 years of education, 40 years of age and married?

Here, comes issue of PSM as exact match is not there but these two persons seem to be similar. Propensity score matching is a method used to estimate the average treatment effect by matching treatment and control units based on their propensity scores. The propensity score is the probability of receiving the treatment given the observed covariates.

Code
ps_match <- matchit(treat ~ age+ age2 + educ + race + married + nodegree,data = lalonde)
ps_match
A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression
 - number of obs.: 614 (original), 370 (matched)
 - target estimand: ATT
 - covariates: age, age2, educ, race, married, nodegree

So now instead of 45 matches, we have 185 matches from each group based on 5 covariates. We can see the summary of the matching as follows:

Code
summary(ps_match)

Call:
matchit(formula = treat ~ age + age2 + educ + race + married + 
    nodegree, data = lalonde)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6204        0.1637          1.7929     1.3755    0.3992
age              25.8162       28.0303         -0.3094     0.4400    0.0813
age2            717.3946      901.7786         -0.4276     0.3627    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
           eCDF Max
distance     0.6752
age          0.1577
age2         0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6204        0.3300          1.1401     1.0782    0.1687
age              25.8162       26.4216         -0.0846     0.7654    0.0381
age2            717.3946      764.6270         -0.1095     0.7823    0.0381
educ             10.3459       10.3135          0.0161     0.5352    0.0284
raceblack         0.8432        0.4486          1.0853          .    0.3946
racehispan        0.0595        0.2054         -0.6172          .    0.1459
racewhite         0.0973        0.3459         -0.8390          .    0.2486
married           0.1892        0.3243         -0.3450          .    0.1351
nodegree          0.7081        0.6486          0.1308          .    0.0595
           eCDF Max Std. Pair Dist.
distance     0.4811          1.1401
age          0.1135          1.2375
age2         0.1135          1.1531
educ         0.0649          1.1990
raceblack    0.3946          1.0853
racehispan   0.1459          0.9829
racewhite    0.2486          0.9120
married      0.1351          0.8419
nodegree     0.0595          0.8679

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

Propensity score matching diagnostics

Code
love.plot(ps_match, drop.distance = FALSE)
Warning: Standardized mean differences and raw mean differences are present in the same plot. 
Use the `stars` argument to distinguish between them and appropriately label the x-axis.
Figure 2: Diagnostics of propensity score matching

Figure 2 indicates that the standardized mean differences are close to zero for covariates other than race after propensity score matching, which indicates that the treatment and control groups are well balanced on the covariates.

Nearest neighbor matching with replacement

The method argument is set to “nearest” to perform nearest neighbor matching. By default, the matchit function performs nearest neighbor matching with replacement, which means that a control unit can be matched to multiple treatment units.

Code
# Perform propensity score matching with nearest neighbor matching without replacement
match_model <- matchit(treat ~ age + age2 + educ+ race + married + re74 + re75, 
                       data = lalonde, method = "nearest", replace = TRUE)

# Summary of the matching
summary(match_model)

Call:
matchit(formula = treat ~ age + age2 + educ + race + married + 
    re74 + re75, data = lalonde, method = "nearest", replace = TRUE)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6203        0.1638          1.8245     1.2861    0.3988
age              25.8162       28.0303         -0.3094     0.4400    0.0813
age2            717.3946      901.7786         -0.4276     0.3627    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6760
age          0.1577
age2         0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6203        0.6197          0.0023     0.9859    0.0028
age              25.8162       25.4649          0.0491     0.9766    0.0391
age2            717.3946      699.6486          0.0411     1.1716    0.0391
educ             10.3459       10.1081          0.1183     0.4193    0.0381
raceblack         0.8432        0.8270          0.0446          .    0.0162
racehispan        0.0595        0.0811         -0.0914          .    0.0216
racewhite         0.0973        0.0919          0.0182          .    0.0054
married           0.1892        0.1568          0.0828          .    0.0324
re74           2095.5737     1742.7413          0.0722     1.7125    0.0396
re75           1532.0553     1442.0516          0.0280     1.5596    0.0278
           eCDF Max Std. Pair Dist.
distance     0.0486          0.0146
age          0.1135          0.8167
age2         0.1135          0.7885
educ         0.1622          1.1936
raceblack    0.0162          0.1933
racehispan   0.0216          0.4572
racewhite    0.0054          0.2371
married      0.0324          0.6625
re74         0.2162          0.5364
re75         0.1297          0.6045

Sample Sizes:
              Control Treated
All            429.       185
Matched (ESS)   42.52     185
Matched         75.       185
Unmatched      354.         0
Discarded        0.         0
Code
# Extract matched data
matched_data <- match.data(match_model)

# Balance diagnostics
balance <- bal.tab(match_model, un = TRUE)

# Print balance diagnostics
print(balance)
Balance Measures
                Type Diff.Un Diff.Adj
distance    Distance  1.8245   0.0023
age          Contin. -0.3094   0.0491
age2         Contin. -0.4276   0.0411
educ         Contin.  0.0550   0.1183
race_black    Binary  0.6404   0.0162
race_hispan   Binary -0.0827  -0.0216
race_white    Binary -0.5577   0.0054
married       Binary -0.3236   0.0324
re74         Contin. -0.7211   0.0722
re75         Contin. -0.2903   0.0280

Sample sizes
                     Control Treated
All                   429.       185
Matched (ESS)          42.52     185
Matched (Unweighted)   75.       185
Unmatched             354.         0
Code
# Summarize balance diagnostics
summary(balance)
             Length Class      Mode
Balance      3      data.frame list
Observations 2      data.frame list
call         5      -none-     call

Nearest neighbor matching diagnostics

Code
# Plot balance diagnostics
love.plot(balance, var.order = "unadjusted", threshold = 0.1)
Warning: Standardized mean differences and raw mean differences are present in the same plot. 
Use the `stars` argument to distinguish between them and appropriately label the x-axis.
Figure 3: Balance diagnostics for nearest neighbor matching without replacement

Nearest neighbor matching without replacement

To perform nearest neighbor matching without replacement, we can set the replace argument to FALSE. This ensures that each control unit is matched to at most one treatment unit.

Code
# Perform propensity score matching
match_model_wor <- matchit(treat ~ age + age2 + educ + race + married + re74 + re75, data = lalonde, method = "nearest", replace = FALSE)


# Summary of the matching
summary(match_model_wor)

Call:
matchit(formula = treat ~ age + age2 + educ + race + married + 
    re74 + re75, data = lalonde, method = "nearest", replace = FALSE)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6203        0.1638          1.8245     1.2861    0.3988
age              25.8162       28.0303         -0.3094     0.4400    0.0813
age2            717.3946      901.7786         -0.4276     0.3627    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6760
age          0.1577
age2         0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.6203        0.3337          1.1452     1.0188    0.1608
age              25.8162       25.7946          0.0030     0.8198    0.0305
age2            717.3946      727.4703         -0.0234     0.8493    0.0305
educ             10.3459       10.2270          0.0591     0.4109    0.0461
raceblack         0.8432        0.4432          1.1002          .    0.4000
racehispan        0.0595        0.2216         -0.6857          .    0.1622
racewhite         0.0973        0.3351         -0.8025          .    0.2378
married           0.1892        0.3135         -0.3174          .    0.1243
re74           2095.5737     3038.1026         -0.1929     1.1840    0.1010
re75           1532.0553     2026.3935         -0.1536     1.1539    0.0826
           eCDF Max Std. Pair Dist.
distance     0.4703          1.1452
age          0.1027          1.1362
age2         0.1027          1.0324
educ         0.1243          1.3603
raceblack    0.4000          1.1002
racehispan   0.1622          1.0514
racewhite    0.2378          0.9484
married      0.1243          0.9523
re74         0.3568          0.7503
re75         0.2000          0.8552

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0
Code
# Extract matched data
matched_data_wor <- match.data(match_model_wor)

# Balance diagnostics
balance_wor <- bal.tab(match_model_wor, un = TRUE)

# Print balance diagnostics
print(balance_wor)
Balance Measures
                Type Diff.Un Diff.Adj
distance    Distance  1.8245   1.1452
age          Contin. -0.3094   0.0030
age2         Contin. -0.4276  -0.0234
educ         Contin.  0.0550   0.0591
race_black    Binary  0.6404   0.4000
race_hispan   Binary -0.0827  -0.1622
race_white    Binary -0.5577  -0.2378
married       Binary -0.3236  -0.1243
re74         Contin. -0.7211  -0.1929
re75         Contin. -0.2903  -0.1536

Sample sizes
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0

Nearest neighbor matching without replacement diagnostics

Code
love.plot(balance_wor, var.order = "unadjusted", threshold = 0.1)
Warning: Standardized mean differences and raw mean differences are present in the same plot. 
Use the `stars` argument to distinguish between them and appropriately label the x-axis.
Figure 4: Balance diagnostics for nearest neighbor matching without replacement

This Figure 4 plot shows the standardized mean differences for each covariate before and after matching. The standardized mean differences are almost zero other than race and re74 variables which are little away (blue dots), indicating that the treatment and control groups are well balanced on the covariates.

Matching with caliper

The caliper argument specifies the maximum distance within which a control unit can be matched to a treatment unit. This is useful for ensuring that the matched units are similar to each other.

Code
match_caliper <- matchit(treat ~ age + age2 + educ + race + married + re74 + re75, data = lalonde, method = "nearest", caliper = 0.01)
bal.tab(match_caliper)
Balance Measures
                Type Diff.Adj
distance    Distance   0.0016
age          Contin.   0.0945
age2         Contin.   0.0966
educ         Contin.   0.2048
race_black    Binary   0.0294
race_hispan   Binary   0.0000
race_white    Binary  -0.0294
married       Binary   0.0000
re74         Contin.   0.1912
re75         Contin.  -0.0282

Sample sizes
          Control Treated
All           429     185
Matched        68      68
Unmatched     361     117

Kernel matching

Code
library(Matching)
match_kernel <- matchit(treat ~ age + age2 + educ + race+ married + re74 + re75, data = lalonde, method = "optimal")
bal.tab(match_kernel)
Balance Measures
                Type Diff.Adj
distance    Distance   1.1452
age          Contin.  -0.0121
age2         Contin.  -0.0403
educ         Contin.   0.0511
race_black    Binary   0.4000
race_hispan   Binary  -0.1622
race_white    Binary  -0.2378
married       Binary  -0.1243
re74         Contin.  -0.1778
re75         Contin.  -0.1397

Sample sizes
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0

Radius matching

Code
match_radius <- matchit(treat ~ age + age2 + educ +race +  married + re74 + re75, data = lalonde, method = "nearest", caliper = 0.01)
bal.tab(match_radius)
Balance Measures
                Type Diff.Adj
distance    Distance   0.0016
age          Contin.   0.0945
age2         Contin.   0.0966
educ         Contin.   0.2048
race_black    Binary   0.0294
race_hispan   Binary   0.0000
race_white    Binary  -0.0294
married       Binary   0.0000
re74         Contin.   0.1912
re75         Contin.  -0.0282

Sample sizes
          Control Treated
All           429     185
Matched        68      68
Unmatched     361     117

Mahalanobis matching

Code
match_mahal <- matchit(treat ~ age + age2 + educ + race+ married + re74 + re75, data = lalonde, method = "nearest", distance = "mahalanobis")
bal.tab(match_mahal)
Balance Measures
               Type Diff.Adj
age         Contin.   0.0914
age2        Contin.   0.0269
educ        Contin.  -0.2285
race_black   Binary   0.3892
race_hispan  Binary   0.0000
race_white   Binary  -0.3892
married      Binary  -0.0595
re74        Contin.  -0.3539
re75        Contin.  -0.2009

Sample sizes
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Code
# Outcome analysis on the matched data
outcome_model <- lm(re78 ~ treat, data = matched_data)
summary(outcome_model)

Call:
lm(formula = re78 ~ treat, data = matched_data)

Residuals:
   Min     1Q Median     3Q    Max 
 -6349  -5644  -2185   3317  53959 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5644.0      866.4   6.514 3.81e-10 ***
treat          705.1     1027.2   0.686    0.493    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7504 on 258 degrees of freedom
Multiple R-squared:  0.001823,  Adjusted R-squared:  -0.002046 
F-statistic: 0.4712 on 1 and 258 DF,  p-value: 0.493

Propensity Score Matching with Logistic Regression

Fit a logistic regression model to estimate propensity scores by including predictors: age, age squared, education, race (black and hispanic), marital status, and prior earnings (re74, re75)

Code
ps_model <- glm(treat ~ age + age2 + educ+race + married + re74 + re75, 
                data = lalonde, family = binomial)

# Calculate the propensity scores using the fitted model
lalonde$pscore <- predict(ps_model, type = "response")

# Perform propensity score matching with nearest neighbor matching without replacement
match_model_logit <- matchit(treat ~ pscore, data = lalonde, method = "nearest", replace = FALSE)

# Print the summary of the matching
summary(match_model_logit)

Call:
matchit(formula = treat ~ pscore, data = lalonde, method = "nearest", 
    replace = FALSE)

Summary of Balance for All Data:
         Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance        0.6181        0.1647          1.6832     1.5921    0.3988
pscore          0.6203        0.1638          1.8245     1.2861    0.3988
         eCDF Max
distance    0.676
pscore      0.676

Summary of Balance for Matched Data:
         Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance        0.6181        0.3076          1.1528     1.0388    0.1608
pscore          0.6203        0.3337          1.1452     1.0188    0.1608
         eCDF Max Std. Pair Dist.
distance   0.4703          1.1528
pscore     0.4703          1.1452

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0
Code
matched_data_logit <- match.data(match_model_logit)

Perform balance diagnostics to check if the covariates are balanced after matching

Code
balance_logit <- bal.tab(match_model_logit, un = TRUE)

# Print the balance diagnostics
print(balance_logit)
Balance Measures
             Type Diff.Un Diff.Adj
distance Distance  1.6832   1.1528
pscore    Contin.  1.8245   1.1452

Sample sizes
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0

This indicates that a difference of 1.15 is adjusted for the covariates.

Conduct outcome analysis on the matched data using a linear model

Code
outcome_model <- lm(re78 ~ treat, data = matched_data_logit)
summary(outcome_model)

Call:
lm(formula = re78 ~ treat, data = matched_data_logit)

Residuals:
   Min     1Q Median     3Q    Max 
 -6349  -5656  -2093   3263  53959 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5656.0      524.1  10.792   <2e-16 ***
treat          693.1      741.2   0.935     0.35    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7129 on 368 degrees of freedom
Multiple R-squared:  0.002371,  Adjusted R-squared:  -0.0003402 
F-statistic: 0.8745 on 1 and 368 DF,  p-value: 0.3503

Entropic balancing

Entropy balancing is a method used in causal inference and observational studies to achieve balance between treated and control groups by reweighting the data. It ensures that the covariate distributions in the treated and control groups are as similar as possible, thus reducing bias when estimating causal effects.

What is Entropic Balancing?

Entropic balancing is based on the principle of entropy maximization. The idea is to find weights for each observation in the dataset such that the weighted empirical distributions of the covariates in the treated and control groups are as similar as possible. This is done by solving an optimization problem where the objective is to maximize entropy, subject to constraints that ensure balance on the covariates.

The process involves:

  • Defining a set of covariates that need to be balanced.
  • Specifying moment conditions that the weighted covariates should satisfy (e.g., mean, variance).
  • Solving a convex optimization problem to find the weights that maximize the entropy while satisfying the moment conditions.
Code
#Balancing covariates between treatment groups (binary)
(ebal <- weightit( treat ~ age + age2+ educ + race+ married + re74+ re75, data = lalonde, method = "ebal", estimand = "ATT"))
A weightit object
 - method: "ebal" (entropy balancing)
 - number of obs.: 614
 - sampling weights: none
 - treatment: 2-category
 - estimand: ATT (focal: 1)
 - covariates: age, age2, educ, race, married, re74, re75
Code
summary(ebal)
                  Summary of weights

- Weight ranges:

           Min                                   Max
treated 1.0000   ||                           1.0000
control 0.0002 |---------------------------| 18.4549

- Units with the 5 most extreme weights by group:
                                              
               5       4     3       2       1
 treated       1       1     1       1       1
             584     374   608     601     541
 control 16.4637 16.8175 17.93 18.3239 18.4549

- Weight statistics:

        Coef of Var   MAD Entropy # Zeros
treated       0.000 0.000   0.000       0
control       2.488 1.339   1.374       0

- Effective Sample Sizes:

           Control Treated
Unweighted  429.       185
Weighted     59.77     185
Code
cobalt::bal.tab(ebal)
Balance Measures
               Type Diff.Adj
age         Contin.       -0
age2        Contin.        0
educ        Contin.        0
race_black   Binary        0
race_hispan  Binary        0
race_white   Binary       -0
married      Binary       -0
re74        Contin.       -0
re75        Contin.        0

Effective sample sizes
           Control Treated
Unadjusted  429.       185
Adjusted     59.77     185
Code
# Add weights to the dataset
lalonde$weights <- ebal$weights

# Create a survey design object
design <- svydesign(ids = ~1, weights = ~weights, data = lalonde)

# Weighted regression analysis
outcome_model_ebal <- svyglm(re78 ~ treat, design = design)

# Summary of the model
summary(outcome_model_ebal)

Call:
svyglm(formula = re78 ~ treat, design = design)

Survey design:
svydesign(ids = ~1, weights = ~weights, data = lalonde)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5039.0      679.9   7.411 4.18e-13 ***
treat         1310.1      892.0   1.469    0.142    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 43389364)

Number of Fisher Scoring iterations: 2

From these results one can see that entropy balancing has been successful in balancing the covariates between the treatment and control groups. The standardized mean differences are exactly zero for all covariates, indicating that the treatment and control groups are well balanced on the covariates.

One can also see it graphically through distribution plots of the covariates before and after weighting.

Covariate plots from Entropic balancing

We can also visualize the distribution of covariates before and after matching using the bal.plot function from the cobalt package. This function creates a plot that shows the distribution of covariates before and after matching for the treatment and control groups.

Code
# Function to create covariate distribution plots with custom colors
create_bal_plot <- function(ebal_obj, var_name) {
  bal.plot(ebal_obj, var.name = var_name, which = "both",
           sample.names = c("Unmatched", "Matched")) +
    scale_fill_manual(values = c("#17becf", "#8c564b"), 
                      labels = c("Control", "Treatment")) +
    labs(title = paste(var_name, "Distribution"), x = var_name) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
          plot.title = element_text(size = 14))
}
Code
# Creating the plots with the specific function
plot.age <- create_bal_plot(ebal, "age")
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
plot.educ <- create_bal_plot(ebal, "educ")
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
plot.race <- create_bal_plot(ebal, "race")
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
plot.married <- create_bal_plot(ebal, "married")
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Code
library(patchwork)

distribution_graphs <- (plot.age + plot.educ)/(plot.race + plot.married)
Code
distribution_graphs +
  ggtitle("Distribution Balance in Unmatched and Matched Samples") +
  theme(plot.title = element_text(size = 16, hjust = 0.5))
Figure 5: Covariates distribution before and after weighting

Figure 5 indicates the distribution of covariates before and after weighting. The plots show that the covariates are well balanced between the treatment and control groups after weighting. Entropic balancing has successfully reduced the differences in covariate distributions between the two groups without having reduction in sample size.

Conclusion

In this tutorial, we explored various methods for propensity score matching (PSM) to estimate the causal effect of a treatment on an outcome variable using observational data. We began with an introduction to the importance of causal inference in research and the challenges of accounting for confounding variables. Propensity score matching was highlighted as a key method for addressing these challenges, closely approximating Randomized Control Trials (RCTs).

We demonstrated how to load and inspect the lalonde dataset, which includes information on individuals who participated in the National Supported Work Demonstration (NSW) program. Using this dataset, we performed a t-test to compare earnings between the treatment and control groups, revealing the need for more sophisticated matching techniques to obtain accurate treatment effect estimates.

We covered several matching methods:

  • Exact Matching: Matching treatment and control units based on exact covariate values, which provided perfect balance but resulted in a small matched sample size.
  • Propensity Score Matching: Estimating the propensity score and matching units based on these scores, significantly improving balance while retaining more observations.
  • Nearest Neighbor Matching: Performing nearest neighbor matching with and without replacement to achieve balance, showing how different configurations affect the results.
  • Caliper, Kernel, Radius, and Mahalanobis Matching: Exploring these methods to further refine the matching process and balance covariates between the treatment and control groups.

Additionally, we introduced Entropic Balancing, a powerful technique that reweights the data to balance covariate distributions between treated and control groups by maximizing entropy. This method achieved excellent balance without losing sample size.

Through balance diagnostics and outcome analysis, we demonstrated the effectiveness of these methods in achieving covariate balance and estimating the treatment effect. The diagnostics plots and statistical summaries provided clear evidence of improved balance and more reliable estimates of the treatment effect.

In conclusion, propensity score matching and entropic balancing are essential tools for causal inference in observational studies. By carefully selecting and applying these methods, researchers can obtain more accurate and credible estimates of treatment effects, even in the absence of random assignment. This tutorial provided a comprehensive overview of these techniques, illustrating their implementation in R and their impact on the analysis of observational data.