Shared Flashcard Set

Details

IDV 721 -Stats I
Terms and Measures
185
Mathematics
Graduate
03/12/2010

Additional Mathematics Flashcards

 


 

Cards

Term
adjusted r squared
Definition
a goodness-of-fit measure in multiple regression analysis that penalizes additional explanatory variables by using a degrees of freedom adjustment in estimating the error variance
Term
alternative hypothesis
Definition
the hypothesis against which the null hypothesis is tested
Term
average
Definition
the sum of numbers divided by n
Term
base group
Definition
the group represented by the overall intercept in a multiple regression model that includes dummy explanatory variables
Term
causal effect
Definition
a ceteris paribus change in one variable has an affect on another variable
Term
bias
Definition
the difference between the expected value of an estimator and the population value is it supposed to be estimating
Term
biased estimator
Definition
an estimator whose expectation, or sample mean, is different from the population value it is supposed to be estimating
Term
Breusch-Pagan Test
Definition
a test for heteroskedasticity where the squared OLS residuals are regressed on the explanatory variables in the model
Term
ceteris paribus
Definition
all other relevant factors are held fixed
Term
Chi-squared distribution
Definition
a probability distribution obtained by adding the squares of independent standard normal random variables. The number of terms in the sum equals the degrees of freedom in the distribution
Term
classic linear model
Definition
the multiple linear regression model under the first set of classical linear model assumptions
Term
cluster sample
Definition
a sample of natural clusters or groups that usually consist of people
Term
confidence interval
Definition
a rule used to construct a random interval so that a certain percentage of all data sets, determined by the confidence level, yields an interval that contains the population value
Term
confidence level
Definition
the percentage of samples in which we want our confidence interval to contain the population value; 95% is the most common confidence level, but 90% and 99% are also used
Term
consistency
Definition
an estimator converges in probability to the correct population value as the sample size grows
Term
covariance
Definition
a measure of linear dependence between two random variables
Term
critical value
Definition
in hypothesis testing, the value against which the test statistic is compared to determine whether or not the null hypothesis is rejected
Term
data frequency
Definition
the interval at which time series data are collected. Yearly, quarterly, and monthly are the most common data frequencies
Term
degrees of freedom
Definition
in multiple regression analysis, the number of observations, minus the number of estimated parameters
Term
dependent variable
Definition
the variable to be explained in the multiple regression model
Term
dummy variable
Definition
a variable that takes on the value of zero or one
Term
dummy variable trap
Definition
the mistake of including too many dummy variables among the independent variables; it occurs when an overall intercept is in the model and a dummy variable is included for each group
Term
econometric model
Definition
an equation relating the dependent variable to a set of explanatory variables and unobserved disturbances, where unknown population parameters determine the ceteris paribus effect of each explanatory variable
Term
economic model
Definition
a relationship derived from economic theory or less formal economic reasoning
Term
elasticity
Definition
the percentage change in one variable given a 1% ceteris paribus increase in another variable
Term
endogeneity
Definition
a term used to describe the presence of an endogenous explanatory variable
Term
endogenous explanatory variable
Definition
an explanatory variable in a multiple regression model that is correlated with the error term, either because of an ommitted variable, measurement error, or simultaneity
Term
endogenous variables
Definition
in simultaneous equation models, variables that determined by the equations in the system
Term
error term
Definition
the variable in a simple or multiple regression equation that contains unobserved factors that affect the dependent variable. The error term may also include measurement errors in the observed dependent or independent variables
Term
error variance
Definition
the variance of the error term in a multiple regression model
Term
estimate
Definition
the numerical value taken on by an estimator for a particular sample of data
Term
estimator
Definition
a rule for combining data to produce a numerical value for a population parameter; the form of the rule does not depend on the particular sample obtained
Term
exogenous variable
Definition
any variable that is uncorrelated with the error term in the model of interest
Term
expected value
Definition
a measure of central tendency in the distribution of a random variable, including an estimator
Term
experiment
Definition
in probability, a general term used to denote an event whose outcome is uncertain. In econometric analysis, it denotes a situation where data are collected by randomly assigning individuals to control and treatment groups
Term
explained sum of squares (SSE)
Definition
the total sample variation of the fitted values in the multiple regression model
Term
explanatory variable
Definition
in regression analysis, a variable that is used to explain variation in the dependent variable
Term
exponential function
Definition
a mathematical function defined for all variables that has an increasing slope but a constant proportionate change
Term
F distribution
Definition
the probability distribution obtained by forming the ration of two independent chi-square random variables, where each has been divided by its degrees of freedom
Term
F statistic
Definition
a statistic used to test multiple hypothesis about the parameters in a multiple regression model
Term
fitted values
Definition
the estimated values of the dependent variable when the values of the independent variables for each observation are plugged into the OLS regression line
Term
Gauss-Markov Assumptions
Definition
the set of assumptions under which OLS is BLUE (best linear unbiased estimator)... 1) linear in parameters 2)random sampling 3)sample variation in the explanatory variable 4)zero conditional mean 5)homoskedasticity
Term
Gauss-Markov Theorem
Definition
the theorem that states that, under the five Gauss-Markov assumptions, the OLS estimator is BLUE (conditional on the sample values of the explanatory variables)
Term
best linear unbiased estimator (BLUE)
Definition
among all linear unbiased estimators, the estimator with the smallest variance. OLS is BLUE, conditional on the sample values of the explanatory variables, under the Gauss-Markov assumptions
Term
goodness-of-fit measure
Definition
a statistic that summarizes how well a set of explanatory variables explains a dependent or response variable
Term
heterogeneity bias
Definition
the bias is OLS due to omitted heterogeneity (or omitted variables)
Term
heteroskedasticity
Definition
the variance of the error term, given the explanatory variables, is not constant
Term
heteroskedasticity of unknown form
Definition
heteroskedasticity that may depend on the explanatory variables in an unknown, arbitrary fashion
Term
heteroskedasticity-robust f statistic
Definition
an F-type statistic that is (asymptotically) robust to heteroskedasticity of unknown form
Term
heteroskedasticity-rubust LM statistic
Definition
an LM statistic that is robust to heteroskedasticity of unknown form
Term
Heteroskedasticity-Robust Standard Error
Definition
a standard error that is (asymptotically) robust to heteroskedasticity of unknown form
Term
Heteroskedasticity-Robust t statistic
Definition
a t statistic that is (asymptotically) robust to heteroskedasticity of unknown form
Term
asymptotic properties
Definition
properties of estimators and test statistics that apply when the sample size grows without bound
Term
homoskedasticity
Definition
the errors in the a regression model have constant variance conditional on the explanatory variables
Term
Assumption SLR.1 (Linear in Parameters)
Definition
In the population mode, the dependent variable, y, is related to the independent variable, x, and the error (or distribution), u, as y= β0 + β1 xi + u where β0 and β1 are the population intercept and slope parameters, respectively.
Term
Assumption SLR. 2 (Random Sampling)
Definition
We have a random sample of size n, {(x1,yi): i =1,2,…..,n}, following the population model y= β0 + β1 xi + u
Term
Assumption SLR.3 (Sample Variation in the Explanatory Variable)
Definition
The sample outcomes on x, namely, {xi, i=1,….n}, are not all the same value
Term
Assumption SLR.4 (Zero Conditional Mean)
Definition
The error u has en expected value of zero given any value of the explanatory variable. In other words, E(u│x)=0
Term
Assumption SLR.5 (Homoskedasticity)
Definition
The error u has the same variance given any value of the explanatory variable. In other words, Var(u│x)= σ2
Term
hypothesis test
Definition
a statistical test of the null, or maintained, hypothesis against an alternative hypothesis
Term
inconsistency
Definition
the difference between the probability limit of an estimator and the parameter value
Term
inconsistent
Definition
an estimator does not coverage (in probability) to the correct population parameter as the sample size grows
Term
interaction effect
Definition
in multiple regression, the partial effect of one explanatory variable depends on the value of a different explanatory variable
Term
interaction term
Definition
an independent variable in a regression model that is the product of two explanatory variables
Term
intercept
Definition
in the equation of a line, the value of the y variable when the x variable is 0
Term
intercept shift
Definition
the intercept in a regression model differs by group or time period
Term
joint distribution
Definition
the probability distribution determining the probabilities of outcomes involving two or more random variables
Term
joint hypothesis test
Definition
a test involving more than one restriction on the parameters in a model
Term
jointly insignificant
Definition
failure to reject, using as F test at a specified significance level, that all coefficients for a group of explanatory variables are zero
Term
jointly statistically significant
Definition
the null hypothesis that two or more explanatory variables have zero population coefficients is rejected at the chosen significance level
Term
Lagrange Multiplier (LM) Statistic
Definition
a test-statistic with large sample justification that can be used to test for omitted variables, heteroskedasticity, and serial correlation, among other model specification problems
Term
Least Squares Estimator
Definition
a estimator that minimizes a sum of squared residuals
Term
level-level model
Definition
a regression model where the dependent variable and the independent variables are in level (or original) form
Term
level-log model
Definition
a regression model where the dependent variable is in level form and (at least one of) the independent variables are in logarithmic form
Term
linear function
Definition
a function where the change in the dependent variable, given a one-unit change in an independent variable, is constant
Term
logarithmic function
Definition
a mathematical function, defined only for positive arguments with a positive but decreasing slope
Term
log function
Definition
a mathematical function, defined only for strictly positive arguments, with a positive but decreasing slope
Term
log-level model
Definition
a regression model where the dependent variable is in logarithmic form and the independent variables are in level (or original) form
Term
log-log model
Definition
a regression model where the dependent variable (and at least some of) the independent variables are in logarithmic form
Term
matrix
Definition
an array of numbers
Term
mean squared error
Definition
the expected squared distance that an estimator is from the population value; it equals the variance plus the square of any bias
Term
measurement error
Definition
the difference between an observed variable and a variable that belongs in a multiple regression equation
Term
median
Definition
in a probability distribution, it is the value where there is 50% chance of being below the value and a 50% chance of being above it. In a sample of numbers, it is the middle value after the numbers have been ordered
Term
missing data
Definition
a data problem that occurs when we do not observe values of some variables for certain observations (individuals, cities, time periods, and so on
Term
multicollinearity
Definition
a term that refers to the correlation among the independent variables in a multiple regression model; it is usually evoked when some correlations are "large," but an actual magnitude is not well-defined
Term
multiple hypothesis test
Definition
a test of a null hypothesis involving more than one restriction on the parameters
Term
Multiple Linear Regression (MLR) Model
Definition
a model linear in its parameters, where the dependent variable is a function of independent variables plus an error term
Term
multiple regression analysis
Definition
a type of analysis that is used to describe estimation of and inference in the multiple linear regression model
Term
nonlinear function
Definition
a function whose slope is not constant
Term
nonnested models
Definition
two or more models where no model can be written as a special case of the other by imposing restrictions on the parameters
Term
nonrandom sample
Definition
a sample obtained other than by sampling randomly from the population of interest
Term
normal distribution
Definition
a probability distribution commonly used in statistics and econometrics for modeling a population. Its probability distribution has a bell shape.
Term
normality assumption
Definition
the classical linear model assumption that states that the error (or dependent variable)has a normal distribution, conditional on the explanatory variables
Term
null hypothesis
Definition
in classical hypothesis testing, we take this hypothesis as true and require the data to provide substantial evidence against it
Term
omitted variable bias
Definition
the bias that arises in the OLS estimators when a relevant variable is omitted from the regression
Term
one-sided alternative
Definition
an alternative hypothesis that states that the parameter is greater than (or less than)the value hypothesized under the null
Term
one tailed test
Definition
a hypothesis test against a one side alternative
Term
ordinary least squares (OLS)
Definition
a method for estimating the parameters of a multiple linear regression model. The OLS estimates are obtained by minimizing the sum of squared residuals
Term
outliers
Definition
observations in a data set that are substantially different from the bulk of the data, perhaps because of error or because some data are generated by a different model than most of the other data
Term
overall significance of a regression
Definition
a test of the joint significance of all explanatory variables appearing in a multiple regression equation
Term
p-Value
Definition
the smallest significance level at which the null hypothesis can be rejected
Term
parameter
Definition
an unknown value that describes a population relationship
Term
partial effect
Definition
the effect on an explanatory variable on the dependent variable, holding other factors in the regression model fixed
Term
percentage change
Definition
the proportionate change in a variable, multiplied by 100
Term
perfect collinearity
Definition
in multiple regression, one independent variable is an exact linear function of one or more other independent variables
Term
Poisson Distribution
Definition
a probability distribution for count variables
Term
population
Definition
a well-defined group (of people, firms, cities, and so on) that is the focus of a statistical tool or econometric analysis
Term
practical significance
Definition
the practical or economic importance of an estimate, which is measured by its sign and magnitude, as opposed to its statistical significance
Term
prediction
Definition
the estimate of an outcome obtained by plugging specific values of the explanatory variables into an estimated model, usually a multiple regression model
Term
quadratic form
Definition
a mathematical function where the vector argument both pre- and post- multiples a square, systematic matrix
Term
quadratic function
Definition
functions that contain squares of one or more explanatory variables; they capture diminishing or decreasing effects on the dependent variable
Term
R-squared
Definition
in a multiple regression model, the proportion of the total sample variation in the dependent variable that is explained by the independent variable
Term
random sample
Definition
a sample obtained by sampling randomly from the specified population
Term
Regression Specification Error Test (RESET)
Definition
a general test for functional form in a multiple regression model; it is an F test of joint significance of the squares, cubes, and perhaps higher powers of the fitted values from the initial estimators
Term
Misspecification Analysis
Definition
the process of determining likely biases that can arise from omitted variables, measurement error, simultaneously, and other kinds of model misspecification
Term
residual
Definition
the difference between the actual value and the fitted (or predicted) value; there is a residual for each observation is a sample used to obtain the OLS regression line
Term
restricted model
Definition
in hypothesis testing, the model obtained after imposing all of the restrictions required after the null
Term
semi-elasticity
Definition
the percentage change in the dependent variable given a one-unit increase in an independent variable
Term
significance level
Definition
the probability of type I error in hypothesis testing
Term
slope
Definition
in the equation of a line, the change in the y variable when the x variable increases by 1
Term
slope parameter
Definition
the coefficient of an independent variable in a multiple regression model
Term
spurious correlation
Definition
a correlation between two variables that is not due to causality, but perhaps to the dependence of the two variables on another unobserved factor
Term
standard deviation
Definition
a common measure of spread in the distribution of a random sample
Term
standard error
Definition
generically, an estimate of the standard deviation of an estimator
Term
statistical inference
Definition
the act of testing hypotheses about population parameters
Term
statistical significance
Definition
the importance of an estimate as measured by the size of a test statistic, usually a t statistic
Term
sum of squared residuals (SSR)
Definition
in multiple regression analysis, the sum of the squared OLS residuals across all observations
Term
t distribution
Definition
the distribution of the ratio of a standard normal random variable and the square root of an independent chi-square random variable is first divided by its df
Term
t statistic
Definition
the statistic used to test a single hypothesis about the parameters in an econometric model
Term
time series data
Definition
data collected over time on one or more variables
Term
Total Sum of Squares (SST)
Definition
the total sampling variance in a dependent variable about its sampling average
Term
true model
Definition
the actual population model relating the dependent variable to the relevant independent variables, plus a disturbance, where the zero conditional mean assumption holds
Term
two-sided alternative
Definition
an alternative where the population parameter can either be less than or greater that the value stated under the null hypothesis
Term
two-tailed test
Definition
a test against a two-sided alternative
Term
Type I error
Definition
a rejection of the null hypothesis when it is true
Term
Type II error
Definition
the failure to reject the null hypothesis when it is false
Term
Type III error
Definition
when a null hypothesis is rejected in favor of a one-tailed alternative hypothesis but the “statistics” has the opposite sign of what the alternative hypothesis is claiming.
Term
unbiased estimator
Definition
an estimator whose expected value (or mean of its sampling distribution)equals the population value (regardless of the population value)
Term
variance
Definition
a measure of spread in the distribution of a random variable
Term
Weighted Least Squares (WLS) Estimator
Definition
a estimator used to adjust for a known form of heteroskedasticity, where each squared residual is weighted by the inverse of the (estimated) variance of the error
Term
White Test
Definition
A test for heteroskedasticity that involves regressing the squared OLS residuals on the OLS fitted values and on the squares of the fitted values; in its most general form, the squared OLS residuals are regressed on the explanatory variables, the squares of the explanatory variables, and all the nonredundant interactions of the explanatory variables
Term
efficiency
Definition
of two estimators, one is more efficient than the other if it has a smaller variance
Term
MLR1 Summary
Definition
y is related to x and the error term in a linear function
Term
MLR1 Importance
Definition
allows us to predict y from x
Term
MLR1 Violation
Definition
creates error
Term
MLR2
Definition
random sampling size of n (equal selection chances across population)
Term
MLR2 Importance
Definition
leads to measurement based statistics that approximate value of parameters
Term
MLR2 Violations
Definition
biased results
Term
MLR3 Summary
Definition
1) none of the IV are constant
2) no exact linear relation among IV
Term
MLR3 Importance
Definition
lets us tell what explanatory variable is having what effect
Term
MLR3 violation
Definition
the slope and intercept estimates are not defined
Term
MLR4 summary
Definition
u has an expected value of 0 with any value of the independent variable
Term
MLR4 importance
Definition
allows us to derive statistical properties as conditional of the values of x in a sample
Term
MLR4 violation
Definition
likely omitted an important variable and so the explanatory power suffers. this is an example of bias due to misspecification
Term
MLR5 summary
Definition
u has the same (constant) variance given any value of the independent variable
Term
MLR5 importance
Definition
needed to justify t tests, F tests, and confidence intervals
Term
MLR5 violation
Definition
heteroskedasticity (non constant variance) affects efficiency
Term
MLR6 summary
Definition
u is independent of the explanatory variables and is normally distributed with a mean of zero
Term
MLR6 importance
Definition
t hat makes statistical inference possible
Term
MLR6 violation
Definition
creates problems with confidence intervals and significance tests because they are based on assumptions of normally distributed errors
Term
difference between efficiency, consistency, and unbiased
Definition
1) efficiency: of two estimators, one is more efficient than the other if it has a smaller variance
2) consistency: as sample size increases, the variance with sample size of interest, the slope gets close to the true variance
3)bias: the difference between the expected value of an estimator and the true population
Term
Type III Error
Definition
Type III error occurs when a null hypothesis is rejected in favor of a one-tailed alternative hypothesis but the “statistics” has the opposite sign of what the alternative hypothesis is claiming.
Term
What is the difference between an F test and R squared?
Definition
An F stat is the only real measure of goodness of fit, as the R-squared only accounts for the variance
Term
Compare errors (measurement error, individual error, random error, population variance, sample variance, standard deviation, standard error, residual error, type I, type II, type III).
Definition
1)individual error is the diff. b/w expected value and individual observed value
2)random error: error due to random variability in individual observation (part of individual error)
3)population Variance (estimate) is the sum of square of the amount by which the observed values deviate from the mean divided by the number in the population.
4)Sample Variance: is the sum of square of the amount by which the observed values deviate from the mean (individual error) divided by the number of comparisons (sample number -1) (By taking the square of the individual error the negative signs disappear.)

5)Standard Deviation: This is a standardized unit of error which accounts for both the magnitude of the observed values and the number of observations in the sample. It is found by taking the square root of the sample variance. In a normally distributed sample one standard deviation will equal approximately 68% of the area under the distribution curve. Or there is a 68% chance of finding a value within one standard deviation of the mean.
6) Standard error: this is a more narrow average about the mean than the standard deviation. It is calculated as the Standard Deviation divided by the square root of the number of observations. As the number of observations increase the estimation of the sample mean is closer to the “true mean” or the population mean. This is the error reported when giving mean values. This is also the error term in determining the t statistic for hypothesis testing and the 95% confidancee interval.
7) Residual error: In regression analysis this is the variation in the dependant variable not explained by the variation in the dependant variables. This is found by finding the expected values of y given the regression model and the observed values of x. The residual error is the difference between the observed values and the calculated or fitted values.

8) There are also errors associated with hypothesis testing: Type I error which is the error of rejecting a null hypothesis when it is really true. Type II error which is to accept the null hypothesis when it is rally false and Type III error which is the rejection of the null hypothesis is correct but the acceptance of a one tailed alternative is incorrect.
Term
What are dummy (binary), proxy, quadratic, interaction, and natural logarithms and when are they used?
Definition
1) dummy variable are coded as 0 or 1 and used to include qualitative data
2)proxy variables are used when the needed variable can't be measured so you find something similar to replace it
3) quadratic terms are used when the data has a turning point
4) interaction terms are used when the effect of one variable is dependent on the partial effect of another variable
5) logarithms are used when we need to bring numbers down and they reduce variability (White Noise). they only work for positive numbers.
Term
How do we read a p-value in the absence of critical values?
Definition
"the probability of committing a Type I error if we reject the null hypothesis that _____ is 4.8% (p value: 0.048)."
Term
Five steps for checking a model
Definition
1) F stat 2) P value associated with F stat 3) R squared 4) signs on coefficients 5) individual significance
Term
What is the difference between reliability and validity?
Definition
1) reliability (consistency): the degree to which measures yield the same result when applied under the same circumstances
2)validity: the effectiveness of measuring instruments in the extent that the instruments measures the phenomenon one wants to study
Term
base/benchmark variable
Definition
the dummy variable with a value of zero
Term
RESET
Definition
test for functional form misspecification
Term
logistic model
Definition
y is a dummy variable
Term
What is Type 1 error?
Definition
null hypothesis is true, but we reject it
Term
What is Type II error?
Definition
null hypothesis is false, but we don't reject it
Term
What is Type III error?
Definition
statistically significant effect, but does not follow the hypothesis
Term
Prob>F
Definition
p value associated with F used to test the null hypothesis that all the model coefficients are zero
Term
heteroskedasticity
Definition
the variance of the unobservable error, u, conditional on the explanatory variable, is not constant
Term
intercept
Definition
the value of of y when x equals 0
Term
interaction term
Definition
variable that is the product of 2 variables
Term
What are the five steps for checking a model?
Definition
1. look at f-statistic 2. look at p-value associated with f statistic 3. see how much variation is explained by the model (r-squared) 4. look at coefficients for correct signs 5. are variables significant
Term
Explain confidence interval.
Definition
we are 95% sure that the true value of the coefficient in the model generated this data falls within this value
Term
How do we read p> absolute value of t=0.000 or p> absolute value = 0.048?
Definition
the probability of committing a Type 1 error if we reject the null hypothesis (the the slope coefficient is zero) is zero is 1000 (4.8% type 1 error)
Term
confidence level
Definition
the % of our samples in which we want our confidence interval to contain the population value
Term
consistency
Definition
as a sample size increases, the variance with sample size of interest, the slope gets closer to the true variance
Term
reliability
Definition
the degree to which measures yield the same results when applied under the same circumstances
Term
validity
Definition
the effectiveness of the measuring instrument in the extent that the instrument measures the phenomenon one wants to study
Supporting users have an ad free experience!