Shared Flashcard Set

Details

Program Evaluation Final
Program Evaluation Final
40
Political Studies
Graduate
07/26/2013

Additional Political Studies Flashcards

 


 

Cards

Term
What is the role of the control variables in cross-sectional multiple regression designs for OIE?
Definition
Control variables capture the factors that may affect Y independently from the program and that may be differently distributed between the treated and non-treated units. Once inserted in a cross-sectional multiple regression design, control variables enable the analysis to separate the impact on Y due to the program from the impact on Y due to other factors that differentiate the treated units from the non-treated units.
Term
The control variables should be selected for a multiple regression model using what criteria in OIE?
Definition
They have to represent factors that may potentially affect the outcome Y of the analysis independently from the program. b) They are distributed differently between the treated and non-treated group
Term
What is the meaning of the term: “sensitivity of impact estimates to different functional forms of the control variables”?
Definition
means that the impact estimates changes drastically based on the type of functional form (dummy, continuous, categorical, etc.) used for inserting the control variables in the estimation model. Such high volatility of the impact estimates is a problem because it does not allow to draw meaningful policy recommendations from the results of the analysis.
Term
What possible solutions can be adopted when results are sensitive to the different functional forms of the control variables?
Definition
1. Run an extensive sensitivity analysis by replicating the model estimation under all possible combinations of alternative functional form choices for the control variables.
2. Implement the analysis with a Propensity score matching technique (through the balancing property test, propensity score matching does allow the analysis to be implemented with a validated functional form for the control variables)
Term
Explain what is the role of the Propensity score for OIE.
Definition
The propensity score (PS) is a parameter (ranging from 0 to 1) capable of summarizing all different control variables adopted in the analysis. For programs targeting units of observation in need of assistance, the PS can be intuitively assimilated to the % degree of initial distress of the units of observations (PS values close to 0 represent low distress, PS values close to 1 represent high distress). PS scores are then used to match treated units with comparable non-treated units through different propensity score matching techniques.
Term
When should you use Cross-Sectional Multiple Regression?
Definition
Treated and non-treated units have not identical characteristics and Panel Data is not available
Term
What form of bias do multiple regression control variables combat?
Definition
Inserting such characteristics as “control variables” in a multiple regression model helps in reducing selection bias:
Term
How do you choose the correct functional form of a variable in multiple regression?
Definition
Unfortunately there are no certain criteria to choose “correct” functional forms.
Possible Solutions:
1) extensive sensitivity analysis to check whether or not results are stable
to different functional form options;
2) using PSM (propensity score matching) to run the analysis instead of multiple regression models.
Term
How does Propensity Score Matching avoid the problems that arise with differing functional forms in regression?
Definition
This is due to the PSM “balancing property”
Term
In what way is using Propensity Scores like regression
Definition
It basically is logit regression.

Besides a couple of technical details, Propensity Scores work very similar to multiple regression. The model is only as good as the control variables.
Term
How is the output of the propensity score interpreted?
Definition
It is the output of a logit regression (from 0 to 1)

-close to 1 = highly disadvantaged initial conditions, more likely to see intervention
(e.g areas with high crime, low income, …)
-close to 0 = favorable initial conditions unlikely to receive intervention
Term
How do we put into practice the "balancing property" for Propensity score matching?
Definition
1) The propensity score - P(T-1) - B0 + B1 + B2 + B3 is estimated based on a specific functional form for each variable

2) All units (both the treated and the non-treated) are sorted according to their Propensity Score value
Term
Under the balancing property of Propensity Score matching, how do you the functional form of your variables is validated?
Definition
The functional form is validated if:
The entire sample can be stratified with contiguous strata containing
at least one treated and one non-treated unit (We need to divide the data into different strata, in our class data for example, the first acceptable strata would be 1-36 since 37 is the first treated unit, 1-36 are all T=0)

2)Within each strata the mean value for each control variable (B1, B2, B3, etc) has to have no statistically significant differences between the treated and the non-treated units
Term
How does "Nearest Available" Propensity Score Matching work?
Definition
You can only use each district once, so you're removing them in pairs of two. The more you use them, the worse the matches will get as there will be less suitable comparisons

1) The treated units (NT) are listed in a separate file and sorted based on their PS value (or in a random order)

2) The first of the treated units (NT) is matched with the non-treated
unit having the most similar PS

3) The two matched units are removed from the original lists and
placed in a third file. Steps 1-3 are replicated for each of the NT
treated units.
Term
Impact Estimate for "Nearest Match" Propensity Score
Definition
1) summing the single ΔY (change in the dependent variable) between each treated and matched units

2) computing the weighted averages of the single ΔY between each treated and matched units with weight Wi
Term
Impact estimate for "Nearest Match" Propensity Score - formula.
Definition
[image]
Term
[image]
Definition
"Nearest Match" Propensity Score Matching
Term
What are trade-offs involved with choosing the tolerance in radius (PS) matching estimators?
Definition
On the one hand, it would be an advantage to enlarge the radius in order to obtain a larger estimation sample (improving the statistical efficiency of the impact estimates). On the other hand, to limit selection bias it would be an advantage to choose a radius as small as possible.
Term
What are advantages and disadvantages of PS matching with replacement versus nearest available PS matching?
Definition
Advantages: it reduces the risk of having to match the last treated units with non-treated units with too distant PS.
Disadvantages: it is more sensitive to measurement errors in the data affecting PS values of non-treated units (i.e. one non-treated unit with high PS can be matched to a large number of treated units, amplifying the effects of the possible measurement errors on the impact estimates).
Term
What is the main advantage of PS matching over multiple regression models?
Definition
PS matching can exploit the “balancing property” to test whether or not a given functional form of the control variables is appropriate. As a consequence, PS matching does not suffer from possible sensitivity of impact estimates to different functional forms of the control variables.
Term
What is the additional advantage of PS matching versus multiple regression models when Y data have to be retrieved through primary data collection?
Definition
Using a PS matching procedure (except for kernel matching) reduces the number of units used in the analysis and for which data collection has to take place;
Reduced costs for the primary data collection of the Y data (in cases in which Y data are not available from statistical offices)
Term
Radius Matching
Definition
Radius matching allows each treated units to be matched with more
than one non-treated unit

A “tolerance radius” is established, so that each non-treated units
with a PS within the tolerance is selected to be part of the comparison
group for a given treated unit.

Radius matching is usually implemented with replacement: a same
non-treated unit can be included in the comparison group of more
than one treated unit.
Term
How to choose the tolerance radius in radius matching PS?
Definition
--Trade-off to be balanced:

•To obtain a larger estimation sample (improving the statistical efficiency of the impact estimates) radius should be kept not too small

•To limit selection bias issues radius should be chosen as small as possible.

In all cases, once eliminated the units outside the common support, min radius has to be chosen so that each treated unit has a non-zero comparison group.
Term
[image]
Definition
Radius Matching
Term
Kernel Matching Details
Definition
Most advanced statistical matching procedure in this class

With Kernel matching the outcome (YT) of each treated unit is compared to a weighted average of the outcomes (YNT) of all non-treated units

Weights are inversely proportional to the distance between PS of the given treated unit (i) and the PS of the non-treated units
Term
[image]
Definition
Kernel Matching
Term
When is Conditional Matching (Kernel Matching) the appropriate matching form?
Definition
Conditional matching is recommended when it is suspected that
relevant unobservable characteristics may be differently distributed between treated and non-treated units that

For example, non-treated units:
1) Are located in different regions or;

II) Operate in different sectors (for analyses with firm-level
outcomes B) or;

III) Have different sizes.
Term
How do you conduct conditional matching?
Definition
Treated and non-treated units are separately sorted into the
categories defined by location or business sector or size

Separately for each category based on location, business sector or size.one matching procedure is applied

such that, the treated units are matched only with the non-treated units belonging to the same category (and therefore
possessing the same unobserved characteristics of the treated unit).
Term
The Importance of using Panel Data
Definition
Comparing treated with non-treated units is not useful when inferring the
impact of the program because of different initial characteristics
Term
Difference in Difference Estimator
Definition
a^=E(Ypost - Ypre | Ti=1) - E(Ypost - Ypre|Ti=0)

DD estimators require the availability of both pre- and post-intervention data

DD allows the analysis to control for some unobservable differences between the treated and the non-treated units
Term
What is the major advantage of panel data?
Definition
Every time we compare outcomes between the treated and non-treated we face a certain danger - namely, the two groups are not really comparable and suffers from selection bias. The treated unit might simply have different characteristics than non-treated units. The major advantage of panel data is that when you don't compare not just the outcomes but also the pre-post change between T=1 and T=0, is that any initial characteristic that is potentially different between T=1 and T=0 (as long as it is a fixed effect) does not have to be controlled for with a control variable. At the very least, you don't need AS MANY control variables.
Term
How does adding additional pre-intervention data make impact estimates for reliable?
Definition
the additional data (e.g. 1995) allows to estimate whether or not the pre-intervention trend of Y was different between the treated and non-treated units. Any difference that is detected between treated and non-treated is incorporated in the analysis as a factor used to adjust the initial estimate of the counterfactual trend.
Term
Difference in Difference in Difference impact formula
Definition
a^ = E[(Y2005 - Y2000) - (Y2000 - Y1995)|Ti=1] - E{(Y2005 - Y2000) - (Y2000 - Y1995)|Ti=0]
Term
With a DDD model, in which way is the counter-factual estimated?
Definition
The counterfactual is estimated as the pre-post intervention change of Y recorded in the non-treated units, corrected by the pre-intervention differential change of Y between the treated and non-treated units.
Term
What is the advantage of combining a DD scheme with Multiple regression or PS matching compared to multiple regression or PS matching without a DD scheme?
Definition
When panel data are available, combining a DD scheme with Multiple regression or PS matching reduces the need to include in the analysis observable control variables. This is because all factors that can be assumed to be fixed effects
Term
How does conditional Difference in Difference with Propensity Score Matching work?
Definition
1) Based on an appropriate set of control variables, a PS variable is estimated
2) A nearest available (with or without replacement) PS matching, or a radius matching procedure is implemented
3) The impact estimates are obtained comparing the pre-post intervention difference of Y between the treated and the matched non-treated units (i.e. a^=E(Ypost - Ypre|Ti=1) - (Ypost - Ypre |Ti=0)
Term
What is the advantage of Conditional DD with multiple regression models (and of Conditional DD with PS matching models) compared to pure DD models?
Definition
Compared to pure DD model, in order to obtain unbiased results, CDD with MR models do not require making the hypothesis that the observable control variables X are fixed effects
Term
What is the advantage of Conditional DD with PS matching models compared to Conditional DD with multiple regression models?
Definition
Compared to Conditional DD with multiple regression models, CDD with PS matching offers the following advantages:
-it solves the issue of sensitivity of impact estimates to different functional forms of the control variables
-it reduces costs for data-collection if Y data has to be collected for the evaluation
Term
With Conditional DD with Multiple regression models (and Conditional DD with PS matching), in which time do you have to measure the control variables? Why is this the case?
Definition
Typically you have to measure control variables at the pre-intervention time. This is to reduce the risk of the control variables becoming endogenous to the treatment (i.e. the control variables becoming affected by the treatment itself)
Term
The Endogeneity Problem
Definition
Very often control variables cannot be included if measured during the same times of the program intervention. This is because of “endogeneity” problems

For example, If EZ incentives works very well, they could lower crime rates in the years during the program intervention. Crime rate changes during the program intervention would not be something to control for, but they would be a secondary outcome of the program intervention
Supporting users have an ad free experience!