Term
The four parts of Statistics |
|
Definition
1)Define Problem 2)Collect Data 3)Analyze and Summarize Data 4)Draw Inference from Data |
|
|
Term
|
Definition
A list of possible values for a variable along wh how often each value occurs. |
|
|
Term
|
Definition
|
|
Term
|
Definition
|
|
Term
|
Definition
Graph of the distribution of one quantitative variable. Bars touch. |
|
|
Term
|
Definition
Distribution of Categorical data. Bars don't touch. |
|
|
Term
|
Definition
Stems on left, leaves on right. Distribution like histogram, except leaves equal the count, not bars. |
|
|
Term
|
Definition
Minimum, Q1 (25 percentile), Median, Q2 (75 percentile), Maximum. This is the summary to use if there are outliers. |
|
|
Term
|
Definition
Boxplots show less detail than histograms or stemplots,used for side-by-side comparison of more than one distribution, visual summary of the five number summary. |
|
|
Term
|
Definition
Q1=median of values to the left of the actual median. Q3 is median of values to the right of the median. |
|
|
Term
|
Definition
If one side of a data set is the mirror image to the other side of the data set. This best describes the normal curve. |
|
|
Term
|
Definition
Tail is on the left or the right, not symmetric |
|
|
Term
|
Definition
A data point that is quite a bit removed from the rest of the data. Measured by being greater than Q3+(1.5*IQR) or less than Q1-(1.5*IQR) |
|
|
Term
Inter Quartile Range (IQR) |
|
Definition
|
|
Term
|
Definition
Measures center of data, balance point. Add up all the numbers, divide by n. Outliers effect mean greatly. |
|
|
Term
|
Definition
Measures center of data. Cuts the ordered data in half. Order the data and find middle observation. If n is even, median is average of two middle observations. Outliers effect median very little, if at all. |
|
|
Term
|
Definition
Measures variability of the data around the mean. Outliers make standard deviation greater than it should be. Avg. distance of data from mean |
|
|
Term
When can you use a Normal Distribution to model a data set |
|
Definition
When the data is a normal (bell shaped) curve |
|
|
Term
How do you obtain a proportion or probability from the Normal Curve |
|
Definition
Covert the value to a z-score and look up the z-score on Table A (Standard Normal Table) It will give you the "less than" percentage. If you want the "greater than" percentage, subtract that probability from 1. |
|
|
Term
What is a z-score. How to obtain a z-score. |
|
Definition
Z score tells us how many SD a value is from the mean. Value - mean divided by Standard deviation |
|
|
Term
|
Definition
68% of observation on Normal distribution within one SD of mean. 95% of data within 2 SD of mean. 99.7% of data within 3 SD of mean |
|
|
Term
Explanatory variable. In Regression? |
|
Definition
variable we are assessing or testing in an experiment. In Regression, this is the x, or the variable that does the predicting. |
|
|
Term
response variable. In Regression? |
|
Definition
the measurement we take to assess the explanatory variable. In regression, the y, the variable we want to predict. |
|
|
Term
|
Definition
is a measure of the linear relationship between x & y |
|
|
Term
|
Definition
symbol for correlation coefficient. The sign of the r is the sign of the slope. Always between -1 and 1. Values close to 0 mean little or no linear relationship. Values close to -1 are strong negative. Values close to +1 are strong positive. No unit of measure. Measures linearity, not any relationship. Correlation between x&y = Correlation between y&x. |
|
|
Term
Least Squares Regression Line |
|
Definition
The lined obtained by MINIMIZING the SUM of the Squared RESIDUALS |
|
|
Term
Why do we use regression equations? |
|
Definition
Used to model relationships between quantitative variables and also for prediction. |
|
|
Term
How to make a prediction using the least squares regression line. |
|
Definition
Plug in given value for x into the equation and solve for y. If you are not given and equation, use the 1st number in the 1st row for output of y, and the 1st # in the 2nd row for slope and use that as your equation. |
|
|
Term
|
Definition
Observed y - Predicted y. |
|
|
Term
How do you interpret slope |
|
Definition
Slope tells us the average increase (or decrease if slope is negative)in y for every one unit increase in x. |
|
|
Term
|
Definition
Tells us the percentage of total variation in the y's that can be explained by the x's (Regression equation.) |
|
|
Term
How do you interpret residual plots? |
|
Definition
Uniform scatter (shoe box) means everything is ok. Outliers mean Normality is violated. Megaphone shape means equal variance is violated. A smile/frown which means relationship is not linear, it's curved. |
|
|
Term
What does standard deviation in regression output mean? |
|
Definition
The "s" in regression output measures the standard deviation of the y's about the regression line. |
|
|
Term
Why is extrapolation bad? |
|
Definition
Extrapolation is using an x value outside of the range of the observed x's to predict y. ad because the relationship outside of the windows may be totally different than relationship observed inside the window of observation. |
|
|
Term
|
Definition
a variable that affects the relationship between the response and the explanatory variable, but is not part of the study. Bad because they can suggest relationships that don't really exist. |
|
|
Term
Why don't we say that a clear association between 2 variables establishes causation? |
|
Definition
|
|
Term
How do we establish causation? |
|
Definition
With experimentation so that lurking variables can be controlled by randomization. |
|
|
Term
What is a marginal distribution for categorical data? |
|
Definition
You take the row (or column) totals and divide them by table totals. (large) |
|
|
Term
What is a Conditional distribution for categorical data? |
|
Definition
obtained by using the cell counts in a row (or column) and dividing them by the row (or column) total. (Small) |
|
|
Term
What happens if all conditional distributions equal the corresponding marginal? |
|
Definition
The row and the column are NOT related. |
|
|
Term
What is a voluntary response sample? |
|
Definition
Samples obtained by having responders contact you instead of contacting the responders (Dear Abby, 900 numbers, Ross Perot & TV Guide) Not probability sample. Many potential responders aren't motivated to respond. |
|
|
Term
What is a convenience sample? |
|
Definition
Taking a sample that is not random. It's easily obtained. (Mall samples. Classes on campus. Not probability because many potential responders don't have the chance to be contacted. |
|
|
Term
What is the population of interest? |
|
Definition
The group of people the researches wants info on. |
|
|
Term
What is the response variable? |
|
Definition
the observation recorded (measured) on each individual. |
|
|
Term
|
Definition
The subgroup of individuals from the population about which the researcher actually obtains info from. |
|
|
Term
|
Definition
Means and percentages for the population. |
|
|
Term
|
Definition
Means and percentages for the sample. |
|
|
Term
What is bias? How is it eliminated? |
|
Definition
amount that the sample systematically differs from what it should be. Eliminated by probability samples and using careful wording, etc. |
|
|
Term
|
Definition
Random sampling from entire population |
|
|
Term
What is a stratified sample? |
|
Definition
Sampling from within groups of a population or sampling within different populations. |
|
|
Term
What is a multistage sample? |
|
Definition
First sampling groups, then sampling within those groups. |
|
|
Term
What is a probability sample? |
|
Definition
Multisatge, Stratified, SRS. Every member of the population has a known non-zero chance of being selected. |
|
|
Term
What do you need to be aware of in sampling? |
|
Definition
undercoverage, non-response bias, lying, wording of questions... |
|
|
Term
|
Definition
|
|
Term
What is an observational study? |
|
Definition
Studies where info is gathered on the population but nothing is inflicted on the subjects (Power lines). |
|
|
Term
|
Definition
|
|
Term
Are samples: observational studies or experiments? |
|
Definition
|
|
Term
Can observational studies establish causation? |
|
Definition
|
|
Term
What is the placebo effect? |
|
Definition
A patient's response to any (even fake) treatment is the placebo effect. |
|
|
Term
What is a control group? Why are they important? |
|
Definition
The patients or group that recieves the placbo or gets no treatment. It eliminates the effect of lurking variables. |
|
|
Term
|
Definition
Having more than one experimental unit per treatment. Necessary to obtain a SD. The bigger the number, the more accurate we are. |
|
|
Term
What do we need to be cautious of in experiments? |
|
Definition
hidden bias and lack of realism. We take care of the first by treating every experimental unit identically and using a double blind. |
|
|
Term
|
Definition
Neither the subjects nor the doctor know who is receiving the treatment and who is recieving the placebo. It removes bias! |
|
|
Term
What is a matched pairs design? |
|
Definition
Taking 2 measurements on each individual; Groups to be compared are related. If order of treatments has to be randomized, it's a matched pairs. |
|
|
Term
Why do we like blocked designs of experiments? |
|
Definition
More precise conclusions. Unwanted variation is removed from standard error. Or variation associated with blocking variable is removed from error. |
|
|