Term
|
Definition
a goodness-of-fit measure in multiple regression analysis that penalizes additional explanatory variables by using a degrees of freedom adjustment in estimating the error variance |
|
|
Term
|
Definition
the hypothesis against which the null hypothesis is tested |
|
|
Term
|
Definition
the sum of numbers divided by n |
|
|
Term
|
Definition
the group represented by the overall intercept in a multiple regression model that includes dummy explanatory variables |
|
|
Term
|
Definition
a ceteris paribus change in one variable has an affect on another variable |
|
|
Term
|
Definition
the difference between the expected value of an estimator and the population value is it supposed to be estimating |
|
|
Term
|
Definition
an estimator whose expectation, or sample mean, is different from the population value it is supposed to be estimating |
|
|
Term
|
Definition
a test for heteroskedasticity where the squared OLS residuals are regressed on the explanatory variables in the model |
|
|
Term
|
Definition
all other relevant factors are held fixed |
|
|
Term
|
Definition
a probability distribution obtained by adding the squares of independent standard normal random variables. The number of terms in the sum equals the degrees of freedom in the distribution |
|
|
Term
|
Definition
the multiple linear regression model under the first set of classical linear model assumptions |
|
|
Term
|
Definition
a sample of natural clusters or groups that usually consist of people |
|
|
Term
|
Definition
a rule used to construct a random interval so that a certain percentage of all data sets, determined by the confidence level, yields an interval that contains the population value |
|
|
Term
|
Definition
the percentage of samples in which we want our confidence interval to contain the population value; 95% is the most common confidence level, but 90% and 99% are also used |
|
|
Term
|
Definition
an estimator converges in probability to the correct population value as the sample size grows |
|
|
Term
|
Definition
a measure of linear dependence between two random variables |
|
|
Term
|
Definition
in hypothesis testing, the value against which the test statistic is compared to determine whether or not the null hypothesis is rejected |
|
|
Term
|
Definition
the interval at which time series data are collected. Yearly, quarterly, and monthly are the most common data frequencies |
|
|
Term
|
Definition
in multiple regression analysis, the number of observations, minus the number of estimated parameters |
|
|
Term
|
Definition
the variable to be explained in the multiple regression model |
|
|
Term
|
Definition
a variable that takes on the value of zero or one |
|
|
Term
|
Definition
the mistake of including too many dummy variables among the independent variables; it occurs when an overall intercept is in the model and a dummy variable is included for each group |
|
|
Term
|
Definition
an equation relating the dependent variable to a set of explanatory variables and unobserved disturbances, where unknown population parameters determine the ceteris paribus effect of each explanatory variable |
|
|
Term
|
Definition
a relationship derived from economic theory or less formal economic reasoning |
|
|
Term
|
Definition
the percentage change in one variable given a 1% ceteris paribus increase in another variable |
|
|
Term
|
Definition
a term used to describe the presence of an endogenous explanatory variable |
|
|
Term
endogenous explanatory variable |
|
Definition
an explanatory variable in a multiple regression model that is correlated with the error term, either because of an ommitted variable, measurement error, or simultaneity |
|
|
Term
|
Definition
in simultaneous equation models, variables that determined by the equations in the system |
|
|
Term
|
Definition
the variable in a simple or multiple regression equation that contains unobserved factors that affect the dependent variable. The error term may also include measurement errors in the observed dependent or independent variables |
|
|
Term
|
Definition
the variance of the error term in a multiple regression model |
|
|
Term
|
Definition
the numerical value taken on by an estimator for a particular sample of data |
|
|
Term
|
Definition
a rule for combining data to produce a numerical value for a population parameter; the form of the rule does not depend on the particular sample obtained |
|
|
Term
|
Definition
any variable that is uncorrelated with the error term in the model of interest |
|
|
Term
|
Definition
a measure of central tendency in the distribution of a random variable, including an estimator |
|
|
Term
|
Definition
in probability, a general term used to denote an event whose outcome is uncertain. In econometric analysis, it denotes a situation where data are collected by randomly assigning individuals to control and treatment groups |
|
|
Term
explained sum of squares (SSE) |
|
Definition
the total sample variation of the fitted values in the multiple regression model |
|
|
Term
|
Definition
in regression analysis, a variable that is used to explain variation in the dependent variable |
|
|
Term
|
Definition
a mathematical function defined for all variables that has an increasing slope but a constant proportionate change |
|
|
Term
|
Definition
the probability distribution obtained by forming the ration of two independent chi-square random variables, where each has been divided by its degrees of freedom |
|
|
Term
|
Definition
a statistic used to test multiple hypothesis about the parameters in a multiple regression model |
|
|
Term
|
Definition
the estimated values of the dependent variable when the values of the independent variables for each observation are plugged into the OLS regression line |
|
|
Term
|
Definition
the set of assumptions under which OLS is BLUE (best linear unbiased estimator)... 1) linear in parameters 2)random sampling 3)sample variation in the explanatory variable 4)zero conditional mean 5)homoskedasticity |
|
|
Term
|
Definition
the theorem that states that, under the five Gauss-Markov assumptions, the OLS estimator is BLUE (conditional on the sample values of the explanatory variables) |
|
|
Term
best linear unbiased estimator (BLUE) |
|
Definition
among all linear unbiased estimators, the estimator with the smallest variance. OLS is BLUE, conditional on the sample values of the explanatory variables, under the Gauss-Markov assumptions |
|
|
Term
|
Definition
a statistic that summarizes how well a set of explanatory variables explains a dependent or response variable |
|
|
Term
|
Definition
the bias is OLS due to omitted heterogeneity (or omitted variables) |
|
|
Term
|
Definition
the variance of the error term, given the explanatory variables, is not constant |
|
|
Term
heteroskedasticity of unknown form |
|
Definition
heteroskedasticity that may depend on the explanatory variables in an unknown, arbitrary fashion |
|
|
Term
heteroskedasticity-robust f statistic |
|
Definition
an F-type statistic that is (asymptotically) robust to heteroskedasticity of unknown form |
|
|
Term
heteroskedasticity-rubust LM statistic |
|
Definition
an LM statistic that is robust to heteroskedasticity of unknown form |
|
|
Term
Heteroskedasticity-Robust Standard Error |
|
Definition
a standard error that is (asymptotically) robust to heteroskedasticity of unknown form |
|
|
Term
Heteroskedasticity-Robust t statistic |
|
Definition
a t statistic that is (asymptotically) robust to heteroskedasticity of unknown form |
|
|
Term
|
Definition
properties of estimators and test statistics that apply when the sample size grows without bound |
|
|
Term
|
Definition
the errors in the a regression model have constant variance conditional on the explanatory variables |
|
|
Term
Assumption SLR.1 (Linear in Parameters) |
|
Definition
In the population mode, the dependent variable, y, is related to the independent variable, x, and the error (or distribution), u, as y= β0 + β1 xi + u where β0 and β1 are the population intercept and slope parameters, respectively. |
|
|
Term
Assumption SLR. 2 (Random Sampling) |
|
Definition
We have a random sample of size n, {(x1,yi): i =1,2,…..,n}, following the population model y= β0 + β1 xi + u |
|
|
Term
Assumption SLR.3 (Sample Variation in the Explanatory Variable) |
|
Definition
The sample outcomes on x, namely, {xi, i=1,….n}, are not all the same value |
|
|
Term
Assumption SLR.4 (Zero Conditional Mean) |
|
Definition
The error u has en expected value of zero given any value of the explanatory variable. In other words, E(u│x)=0 |
|
|
Term
Assumption SLR.5 (Homoskedasticity) |
|
Definition
The error u has the same variance given any value of the explanatory variable. In other words, Var(u│x)= σ2 |
|
|
Term
|
Definition
a statistical test of the null, or maintained, hypothesis against an alternative hypothesis |
|
|
Term
|
Definition
the difference between the probability limit of an estimator and the parameter value |
|
|
Term
|
Definition
an estimator does not coverage (in probability) to the correct population parameter as the sample size grows |
|
|
Term
|
Definition
in multiple regression, the partial effect of one explanatory variable depends on the value of a different explanatory variable |
|
|
Term
|
Definition
an independent variable in a regression model that is the product of two explanatory variables |
|
|
Term
|
Definition
in the equation of a line, the value of the y variable when the x variable is 0 |
|
|
Term
|
Definition
the intercept in a regression model differs by group or time period |
|
|
Term
|
Definition
the probability distribution determining the probabilities of outcomes involving two or more random variables |
|
|
Term
|
Definition
a test involving more than one restriction on the parameters in a model |
|
|
Term
|
Definition
failure to reject, using as F test at a specified significance level, that all coefficients for a group of explanatory variables are zero |
|
|
Term
jointly statistically significant |
|
Definition
the null hypothesis that two or more explanatory variables have zero population coefficients is rejected at the chosen significance level |
|
|
Term
Lagrange Multiplier (LM) Statistic |
|
Definition
a test-statistic with large sample justification that can be used to test for omitted variables, heteroskedasticity, and serial correlation, among other model specification problems |
|
|
Term
|
Definition
a estimator that minimizes a sum of squared residuals |
|
|
Term
|
Definition
a regression model where the dependent variable and the independent variables are in level (or original) form |
|
|
Term
|
Definition
a regression model where the dependent variable is in level form and (at least one of) the independent variables are in logarithmic form |
|
|
Term
|
Definition
a function where the change in the dependent variable, given a one-unit change in an independent variable, is constant |
|
|
Term
|
Definition
a mathematical function, defined only for positive arguments with a positive but decreasing slope |
|
|
Term
|
Definition
a mathematical function, defined only for strictly positive arguments, with a positive but decreasing slope |
|
|
Term
|
Definition
a regression model where the dependent variable is in logarithmic form and the independent variables are in level (or original) form |
|
|
Term
|
Definition
a regression model where the dependent variable (and at least some of) the independent variables are in logarithmic form |
|
|
Term
|
Definition
|
|
Term
|
Definition
the expected squared distance that an estimator is from the population value; it equals the variance plus the square of any bias |
|
|
Term
|
Definition
the difference between an observed variable and a variable that belongs in a multiple regression equation |
|
|
Term
|
Definition
in a probability distribution, it is the value where there is 50% chance of being below the value and a 50% chance of being above it. In a sample of numbers, it is the middle value after the numbers have been ordered |
|
|
Term
|
Definition
a data problem that occurs when we do not observe values of some variables for certain observations (individuals, cities, time periods, and so on |
|
|
Term
|
Definition
a term that refers to the correlation among the independent variables in a multiple regression model; it is usually evoked when some correlations are "large," but an actual magnitude is not well-defined |
|
|
Term
|
Definition
a test of a null hypothesis involving more than one restriction on the parameters |
|
|
Term
Multiple Linear Regression (MLR) Model |
|
Definition
a model linear in its parameters, where the dependent variable is a function of independent variables plus an error term |
|
|
Term
multiple regression analysis |
|
Definition
a type of analysis that is used to describe estimation of and inference in the multiple linear regression model |
|
|
Term
|
Definition
a function whose slope is not constant |
|
|
Term
|
Definition
two or more models where no model can be written as a special case of the other by imposing restrictions on the parameters |
|
|
Term
|
Definition
a sample obtained other than by sampling randomly from the population of interest |
|
|
Term
|
Definition
a probability distribution commonly used in statistics and econometrics for modeling a population. Its probability distribution has a bell shape. |
|
|
Term
|
Definition
the classical linear model assumption that states that the error (or dependent variable)has a normal distribution, conditional on the explanatory variables |
|
|
Term
|
Definition
in classical hypothesis testing, we take this hypothesis as true and require the data to provide substantial evidence against it |
|
|
Term
|
Definition
the bias that arises in the OLS estimators when a relevant variable is omitted from the regression |
|
|
Term
|
Definition
an alternative hypothesis that states that the parameter is greater than (or less than)the value hypothesized under the null |
|
|
Term
|
Definition
a hypothesis test against a one side alternative |
|
|
Term
ordinary least squares (OLS) |
|
Definition
a method for estimating the parameters of a multiple linear regression model. The OLS estimates are obtained by minimizing the sum of squared residuals |
|
|
Term
|
Definition
observations in a data set that are substantially different from the bulk of the data, perhaps because of error or because some data are generated by a different model than most of the other data |
|
|
Term
overall significance of a regression |
|
Definition
a test of the joint significance of all explanatory variables appearing in a multiple regression equation |
|
|
Term
|
Definition
the smallest significance level at which the null hypothesis can be rejected |
|
|
Term
|
Definition
an unknown value that describes a population relationship |
|
|
Term
|
Definition
the effect on an explanatory variable on the dependent variable, holding other factors in the regression model fixed |
|
|
Term
|
Definition
the proportionate change in a variable, multiplied by 100 |
|
|
Term
|
Definition
in multiple regression, one independent variable is an exact linear function of one or more other independent variables |
|
|
Term
|
Definition
a probability distribution for count variables |
|
|
Term
|
Definition
a well-defined group (of people, firms, cities, and so on) that is the focus of a statistical tool or econometric analysis |
|
|
Term
|
Definition
the practical or economic importance of an estimate, which is measured by its sign and magnitude, as opposed to its statistical significance |
|
|
Term
|
Definition
the estimate of an outcome obtained by plugging specific values of the explanatory variables into an estimated model, usually a multiple regression model |
|
|
Term
|
Definition
a mathematical function where the vector argument both pre- and post- multiples a square, systematic matrix |
|
|
Term
|
Definition
functions that contain squares of one or more explanatory variables; they capture diminishing or decreasing effects on the dependent variable |
|
|
Term
|
Definition
in a multiple regression model, the proportion of the total sample variation in the dependent variable that is explained by the independent variable |
|
|
Term
|
Definition
a sample obtained by sampling randomly from the specified population |
|
|
Term
Regression Specification Error Test (RESET) |
|
Definition
a general test for functional form in a multiple regression model; it is an F test of joint significance of the squares, cubes, and perhaps higher powers of the fitted values from the initial estimators |
|
|
Term
Misspecification Analysis |
|
Definition
the process of determining likely biases that can arise from omitted variables, measurement error, simultaneously, and other kinds of model misspecification |
|
|
Term
|
Definition
the difference between the actual value and the fitted (or predicted) value; there is a residual for each observation is a sample used to obtain the OLS regression line |
|
|
Term
|
Definition
in hypothesis testing, the model obtained after imposing all of the restrictions required after the null |
|
|
Term
|
Definition
the percentage change in the dependent variable given a one-unit increase in an independent variable |
|
|
Term
|
Definition
the probability of type I error in hypothesis testing |
|
|
Term
|
Definition
in the equation of a line, the change in the y variable when the x variable increases by 1 |
|
|
Term
|
Definition
the coefficient of an independent variable in a multiple regression model |
|
|
Term
|
Definition
a correlation between two variables that is not due to causality, but perhaps to the dependence of the two variables on another unobserved factor |
|
|
Term
|
Definition
a common measure of spread in the distribution of a random sample |
|
|
Term
|
Definition
generically, an estimate of the standard deviation of an estimator |
|
|
Term
|
Definition
the act of testing hypotheses about population parameters |
|
|
Term
|
Definition
the importance of an estimate as measured by the size of a test statistic, usually a t statistic |
|
|
Term
sum of squared residuals (SSR) |
|
Definition
in multiple regression analysis, the sum of the squared OLS residuals across all observations |
|
|
Term
|
Definition
the distribution of the ratio of a standard normal random variable and the square root of an independent chi-square random variable is first divided by its df |
|
|
Term
|
Definition
the statistic used to test a single hypothesis about the parameters in an econometric model |
|
|
Term
|
Definition
data collected over time on one or more variables |
|
|
Term
Total Sum of Squares (SST) |
|
Definition
the total sampling variance in a dependent variable about its sampling average |
|
|
Term
|
Definition
the actual population model relating the dependent variable to the relevant independent variables, plus a disturbance, where the zero conditional mean assumption holds |
|
|
Term
|
Definition
an alternative where the population parameter can either be less than or greater that the value stated under the null hypothesis |
|
|
Term
|
Definition
a test against a two-sided alternative |
|
|
Term
|
Definition
a rejection of the null hypothesis when it is true |
|
|
Term
|
Definition
the failure to reject the null hypothesis when it is false |
|
|
Term
|
Definition
when a null hypothesis is rejected in favor of a one-tailed alternative hypothesis but the “statistics” has the opposite sign of what the alternative hypothesis is claiming. |
|
|
Term
|
Definition
an estimator whose expected value (or mean of its sampling distribution)equals the population value (regardless of the population value) |
|
|
Term
|
Definition
a measure of spread in the distribution of a random variable |
|
|
Term
Weighted Least Squares (WLS) Estimator |
|
Definition
a estimator used to adjust for a known form of heteroskedasticity, where each squared residual is weighted by the inverse of the (estimated) variance of the error |
|
|
Term
|
Definition
A test for heteroskedasticity that involves regressing the squared OLS residuals on the OLS fitted values and on the squares of the fitted values; in its most general form, the squared OLS residuals are regressed on the explanatory variables, the squares of the explanatory variables, and all the nonredundant interactions of the explanatory variables |
|
|
Term
|
Definition
of two estimators, one is more efficient than the other if it has a smaller variance |
|
|
Term
|
Definition
y is related to x and the error term in a linear function |
|
|
Term
|
Definition
allows us to predict y from x |
|
|
Term
|
Definition
|
|
Term
|
Definition
random sampling size of n (equal selection chances across population) |
|
|
Term
|
Definition
leads to measurement based statistics that approximate value of parameters |
|
|
Term
|
Definition
|
|
Term
|
Definition
1) none of the IV are constant 2) no exact linear relation among IV |
|
|
Term
|
Definition
lets us tell what explanatory variable is having what effect |
|
|
Term
|
Definition
the slope and intercept estimates are not defined |
|
|
Term
|
Definition
u has an expected value of 0 with any value of the independent variable |
|
|
Term
|
Definition
allows us to derive statistical properties as conditional of the values of x in a sample |
|
|
Term
|
Definition
likely omitted an important variable and so the explanatory power suffers. this is an example of bias due to misspecification |
|
|
Term
|
Definition
u has the same (constant) variance given any value of the independent variable |
|
|
Term
|
Definition
needed to justify t tests, F tests, and confidence intervals |
|
|
Term
|
Definition
heteroskedasticity (non constant variance) affects efficiency |
|
|
Term
|
Definition
u is independent of the explanatory variables and is normally distributed with a mean of zero |
|
|
Term
|
Definition
t hat makes statistical inference possible |
|
|
Term
|
Definition
creates problems with confidence intervals and significance tests because they are based on assumptions of normally distributed errors |
|
|
Term
difference between efficiency, consistency, and unbiased |
|
Definition
1) efficiency: of two estimators, one is more efficient than the other if it has a smaller variance 2) consistency: as sample size increases, the variance with sample size of interest, the slope gets close to the true variance 3)bias: the difference between the expected value of an estimator and the true population |
|
|
Term
|
Definition
Type III error occurs when a null hypothesis is rejected in favor of a one-tailed alternative hypothesis but the “statistics” has the opposite sign of what the alternative hypothesis is claiming. |
|
|
Term
What is the difference between an F test and R squared? |
|
Definition
An F stat is the only real measure of goodness of fit, as the R-squared only accounts for the variance |
|
|
Term
Compare errors (measurement error, individual error, random error, population variance, sample variance, standard deviation, standard error, residual error, type I, type II, type III). |
|
Definition
1)individual error is the diff. b/w expected value and individual observed value 2)random error: error due to random variability in individual observation (part of individual error) 3)population Variance (estimate) is the sum of square of the amount by which the observed values deviate from the mean divided by the number in the population. 4)Sample Variance: is the sum of square of the amount by which the observed values deviate from the mean (individual error) divided by the number of comparisons (sample number -1) (By taking the square of the individual error the negative signs disappear.)
5)Standard Deviation: This is a standardized unit of error which accounts for both the magnitude of the observed values and the number of observations in the sample. It is found by taking the square root of the sample variance. In a normally distributed sample one standard deviation will equal approximately 68% of the area under the distribution curve. Or there is a 68% chance of finding a value within one standard deviation of the mean. 6) Standard error: this is a more narrow average about the mean than the standard deviation. It is calculated as the Standard Deviation divided by the square root of the number of observations. As the number of observations increase the estimation of the sample mean is closer to the “true mean” or the population mean. This is the error reported when giving mean values. This is also the error term in determining the t statistic for hypothesis testing and the 95% confidancee interval. 7) Residual error: In regression analysis this is the variation in the dependant variable not explained by the variation in the dependant variables. This is found by finding the expected values of y given the regression model and the observed values of x. The residual error is the difference between the observed values and the calculated or fitted values.
8) There are also errors associated with hypothesis testing: Type I error which is the error of rejecting a null hypothesis when it is really true. Type II error which is to accept the null hypothesis when it is rally false and Type III error which is the rejection of the null hypothesis is correct but the acceptance of a one tailed alternative is incorrect. |
|
|
Term
What are dummy (binary), proxy, quadratic, interaction, and natural logarithms and when are they used? |
|
Definition
1) dummy variable are coded as 0 or 1 and used to include qualitative data 2)proxy variables are used when the needed variable can't be measured so you find something similar to replace it 3) quadratic terms are used when the data has a turning point 4) interaction terms are used when the effect of one variable is dependent on the partial effect of another variable 5) logarithms are used when we need to bring numbers down and they reduce variability (White Noise). they only work for positive numbers. |
|
|
Term
How do we read a p-value in the absence of critical values? |
|
Definition
"the probability of committing a Type I error if we reject the null hypothesis that _____ is 4.8% (p value: 0.048)." |
|
|
Term
Five steps for checking a model |
|
Definition
1) F stat 2) P value associated with F stat 3) R squared 4) signs on coefficients 5) individual significance |
|
|
Term
What is the difference between reliability and validity? |
|
Definition
1) reliability (consistency): the degree to which measures yield the same result when applied under the same circumstances 2)validity: the effectiveness of measuring instruments in the extent that the instruments measures the phenomenon one wants to study |
|
|
Term
|
Definition
the dummy variable with a value of zero |
|
|
Term
|
Definition
test for functional form misspecification |
|
|
Term
|
Definition
|
|
Term
|
Definition
null hypothesis is true, but we reject it |
|
|
Term
|
Definition
null hypothesis is false, but we don't reject it |
|
|
Term
|
Definition
statistically significant effect, but does not follow the hypothesis |
|
|
Term
|
Definition
p value associated with F used to test the null hypothesis that all the model coefficients are zero |
|
|
Term
|
Definition
the variance of the unobservable error, u, conditional on the explanatory variable, is not constant |
|
|
Term
|
Definition
the value of of y when x equals 0 |
|
|
Term
|
Definition
variable that is the product of 2 variables |
|
|
Term
What are the five steps for checking a model? |
|
Definition
1. look at f-statistic 2. look at p-value associated with f statistic 3. see how much variation is explained by the model (r-squared) 4. look at coefficients for correct signs 5. are variables significant |
|
|
Term
Explain confidence interval. |
|
Definition
we are 95% sure that the true value of the coefficient in the model generated this data falls within this value |
|
|
Term
How do we read p> absolute value of t=0.000 or p> absolute value = 0.048? |
|
Definition
the probability of committing a Type 1 error if we reject the null hypothesis (the the slope coefficient is zero) is zero is 1000 (4.8% type 1 error) |
|
|
Term
|
Definition
the % of our samples in which we want our confidence interval to contain the population value |
|
|
Term
|
Definition
as a sample size increases, the variance with sample size of interest, the slope gets closer to the true variance |
|
|
Term
|
Definition
the degree to which measures yield the same results when applied under the same circumstances |
|
|
Term
|
Definition
the effectiveness of the measuring instrument in the extent that the instrument measures the phenomenon one wants to study |
|
|