misc

Performance

fastdid is magnitudes faster than did, and 15x faster than the fastest alternative DiDforBigData (dfbd for short) for large dataset. fastdid also uses less memory. Here is a comparison of run time for fastdid, did, and dfbd using a panel of 10 periods and varying samples sizes.

Unfortunately, the Author’s computer fails to run did at 1 million units. For a rough idea, DiDforBigData is about 100x faster than did in Bradley Setzler’s benchmark. Other staggered DiD implementations are even slower than did.

For memory:

For the benchmark, a baseline group-time ATT is estimated with no covariates control, no bootstrap, and no explicit parallelization. Computing time is measured by microbenchmark and peak RAM by peakRAM.

Validitiy

Before each release, we conduct tests to ensure the validity of estimates from fastdid.

Basics: comparison with did

For features included in CS, fastdid maintains a maximum of 1% difference from results from the did package. This margin of error is mostly for bootstrapped results due to its inherent randomess. For point estimates, the difference is smaller than 1e-12, and is most likely the result of floating-point error. The relevant test files are compare_est.R.

Extensions: coverage test

For features not included in CS, fastdid maintains that the 95% confidence intervals have a coverage rate between 94% and 96%.

The coverage rate is calculated by running 200 iterations. In each iteration, we test whether the confidence interval estimated covers the group-truth values. We then average the rate across iterations. Due to the randomness of coverage, the realized coverage fall outside of the thresholds in about 1% of the time. The relevant test file is coverage.R.

Experimental Features: not tested

As an attempt to balance the validity and flexibility of fastdid, “experimental features” is introduced in version 0.9.4. These features will be less tested and documented, and it is generally advised to not use them unless the user know what they and the package are doing. These experimental features can be accessed via the exper argument. For example, to use the filtervar feature, call fastdid(..., exper = list(filtervar = "FF")).

The current list of experimental features are

  • max_control_cohort_diff: limit the max cohort difference between treated and control group
  • filtervar, filtervar_post: limit the units being used as treated and control group with a potentially-time-varying variable in base (post) period
  • only_balance_2by2: only require observations to have non-NA values within each 2 by 2 DiD, instead of throughout all time periods. Can be an alternative way of dealing with unbalanced panel by filling the missing periods with NAs. Not recommended as CS only have allow_unbalance_panel, which uses a repeated cross-section 2 by 2 DiD estimator.
  • custom_scheme: aggregate to user-defined parameters

Comparison with did

As the name suggests, fastdid’s goal is to be fast did. Besides performance, here are some comparisons between the two packages.

Estimator

fastdid’s estimators is identical to did’s. As the performance gains mostly come from efficient data manipulation, the key estimation implementations are analogous. For example, 2x2 DiD (estimate_did.R and DRDID::std_ipw_did_panel), influence function from weights (aggregate_gt.R/get_weight_influence, compute.aggte.R/wif), and multiplier bootstrap (get_se.R and mboot.R).

Interface

fastdid should feel similar to att_gt. But there are a few differences:

Control group option:

fastdid did control group used
both notyettreated never-treated + not-yet-but-eventually-treated
never nevertreated never-treated
notyet not-yet-but-eventually-treated

Aggregated parameters: fastdid aggregates in the same function.

fastdid did
group_time no aggregation
dynamic dynamic
time calendar
group group
simple simple

Other

  1. fastdid only offers inverse probability weights estimators for controlling for covariates when allowing for unbalanced panels.
  2. fastdid use universal base periods as default.