Shared Flashcard Set

Details

Cross Sectional Data Analysis-Section 1
Important terms and concepts from CSDA Fall 2012
34
Other
Graduate
09/29/2012

Additional Other Flashcards

 


 

Cards

Term
What does regression analysis actually do? 
Definition
The depedency of at least one variable on one or more explanatory (independent) variables to estimate a conditional population mean of the dependent variable as accurately as possible with the informaiton given
Term
Stochastic Error Term
Definition
The deviation of a specific value from the conditional mean is an unobserved random variable/error (can be positive or negative)
Term
How do we interpret the intercept (ß0)?
Definition
Intercept is the predicted value of the dependent variable if the independent variable takes a value of zero
Term
How do we interpret the slope (ß1)?
Definition
The slope informs us about the increase/decrease in (the unit of measurement of) the dependent variable if the independent variable is changed by one unit of measurement
Term
Assumptions of Linear Regression Model 
Definition

A1: the population regression funciton is linear in its parameters

 

A2: the values of the regressor x are fixed (not stochastic or random)

 

A3: the mean (conditional expectation) of the stochastic error term ui is for any given value of x always zero (cancels out)

 

A4: for any given values of x, the conditioanl variance of the error term ui is a constant for any observation. (Homoscedasticity)

 

A5: For any xi and xj, we demand the correlation between the errors ui and uj is always zero (no autocorrelation)

 

A6: Regressor and error terms are not correlated (no covariance between error term and regressior)

 

A7: the number of observations (n) must be larger than the number of parameters (k)

 

A8: the variance of x must be positive finitie 

 

A9: must be correctly specified in all respects

 

A10: There can be no perfect multicollinearity between regressors

Term
Equation for the intercept
Definition

[image]

 

Term
Equation for slope
Definition
[image]
Term
In multivariable regression what does intercept represent?
Definition
intercept is the predicted value of y if all the independent variables equal zero
Term
How can we tell the accuracy of the estimator?
Definition
Standard Error (simply is the standard deviation of its sampling distribution)
Term
What do we need before calculating the standard error?
Definition

Must first calculate the

variance of the error term:

Var(u|x)=σ2

Term
What is unbiased estimator of the error term for multiple OLS regression?
Definition
σ2=RSS/(n-k-1)
Term

Standard Error of the Regression

(also called root mean squared error/RMSE)

Definition

σ=sqrt[∑ui2/n-k-1]

 

  • used as a "goodness of fit" measure 
  • want it to be as small as possible (means the better the fit of the data)
Term
What are the assumptions of the distribution of the error term?
Definition
  • ui's must have an expected value of zero
  • must not be correlated to each other
  • assume constant variance
  • each error term must be normally distributed 
Term
When do we use the t-test?
Definition
  • To test the null hypothesis H0: Bj=Bi

(looking for signficiant differences in the means)

  • must calculate the t statistic to determine that

t ≡ (Bj-Bj*)/se(Bj) ~ tn-k-1


Stat programs test whether the regressor has any influence on y.  Thus, the H0=Bj=0 and the t stat= Bj/se(Bj)

 

Term
How to test for differences in parameter values?
Definition

Again, use t-test:

t ≡ Bj-Bl/se(Bj-Bl) ~ tn-k-1


se(Bj-Bl)= sqrt[Var(Bj) + Var(Bl) - 2Cov(Bj,Bl)]

Term
What do y, yhat, ybar look like graphically?
Definition
[image]
Term
When do we use the F-test?
Definition

To question whether all the regressors taken together posses any explanatory power


H0: B1=...=Bk=0


 

The null is rejected if calculated F> F from table


 [image]

Term
How do we test the "goodness-of-fit"?
Definition

Coefficient of Determination (aka=R2)


[image] 


 

 

Term
What is the explained variance of the linear model?
Definition

Explained Sum of Squares (ESS)


ESS=∑(yhat-ybar)2

Term
What is the unexplained variance of OLS model?
Definition

Residual Sum of Squares


RSS=∑ui2

 

Term
How do we quantify the total variance of y?
Definition
  • If we had no regressors, the best estimate of y would be its mean

--> Therefore, the total variance of y is the 

Total Sum of Squares (TSS)


TSS=∑(yi-yhat)2

TSS=ESS+RSS

Term

What does R2=1 mean?

And R2=0?

Definition

R2=1 indicates that all observations lie on the regression line (perfect description/modelling of data)


R2=0 means that the regression model has no explanatory power with regards to y


R2 will never decrease if the number of regressors increases even if the newly added regressors have no real explanatory power

 

Term
What is Adjusted R2?  Why do we need it?
Definition

Adj. Rcorrects for degrees of freedom (adding unneccessary regressors just to inflate R2)


 

Term
Items asked to solve from STATA output
(Assignment No.1) 
Definition

Std. Error for regressor

t-stats for regressor and constant

p-values for regressor and constant

Confidence intervals 

F value for the model

 

Term

Items asked to solve from STATA output

(Final Exam)

Definition

Model Mean Squared Error (MMS)

Number of df for residuals

R2

Root Mean Squared Errors (RMSE)

Confidence Intervals

Coefficients

Term
With STATA output: How to solve for R2?
Definition
MSS/TSS
Term

With STATA output: SE of coefficient

1 in bivariate caes) 

Definition
RMSE/sqrt[(∑xi-xbar)2] <-denominator will be given (or enough info to derive it)
Term
With STATA output: t-stat
Definition

ß1/se(ß1)


*easy because it is the two values

listed before it in the row  

Term
With STAT output: p-value
Definition

reverse cumulative t distribution 

 

syntax:

1 - ttail(residual df, - t-stat) + ttail(residual df, + t-test)

Term
with STATA output: F-value
Definition
(MSS/k) / (RSS/(n-k-1))
Term
with STATA output: RMSE
Definition
RMSE = sqrt[RSS/residual df]
Term
with STATA output: Adjusted R2
Definition
Adjusted R2 = 1 - ((RSS/res. df)/(TSS/n-1))
Term
with STATA output: Confidence Intervals
Definition

CI: [ß1 - tvalue*se(ß1), ß1 + tvalue*se(ß1)]

 

tvalue retrieved from table

(usually 1.96 ~ 95%) 

Term

What is meant by a "strong" effect?

What is meant by a "powerful" effect?

Definition

An effect is strong if the coefficient, compard to other coefficients, take on a large value (magnitude)


An effect is powerful/pronounced if the coefficient clearly exceeds its standard error

(that would be the ratio of coefficient to SE which equals the t-value)

Supporting users have an ad free experience!