Term
What is the general forumla to find a
Confidence Interval? |
|
Definition
|
|
Term
How would you find t*23 for a
confidence level of 98%? |
|
Definition
|
|
Term
Significance Test Assumptions
T-Test (One Sample) |
|
Definition
1. Sample is random (SRS)
2. Standard deviation of the population is unknown
3. Population is normally distributed or sample is large |
|
|
Term
Significance Test Assumptions
T-Test (Two Samples) |
|
Definition
1. The samples are independent
2. Sample is random (SRS)
3. Standard deviation of the population is unknown
4. Population is normally distributed or sample is large |
|
|
Term
Significance Test Assumptions
T-Test (Matched Pairs) |
|
Definition
1. Sample is random (SRS)
2. Standard deviation of the population is unknown
3. Population is normally distributed or sample is large |
|
|
Term
Significance Test Assumptions
Proportion-Test (One Sample) |
|
Definition
1. Sample is random (SRS)
2. np > 10
3. n(1-P) > 10
4. 10n < N |
|
|
Term
Significance Test Assumptions
Proportion-Test (Two Samples) |
|
Definition
1. Samples are random (SRS)
2. np > 10 for both
3. n(1-P) > 10 for both
4. 10n < N for both |
|
|
Term
Significance Test Assumptions
Chi-Square-Test |
|
Definition
1. Working with counts
2. All expected counts are at least 5 |
|
|
Term
Significance Test Assumptions
Slope T-Test |
|
Definition
1. The residual plot does not show a curved or fanned pattern
2. A histogram/stemplot/line plot of the residuals shows little skewness and no extreme outliers |
|
|
Term
Characteristics of Chi-Square Test
-GOF
-Homogeneity
-Independence/association |
|
Definition
1. Used to see if the population is uniform (all outcomes occur with equal frequency), the population is normal, or the population is the same as another population with known distribution.
2. Used to see if the two variables are unrelated (independent) or related (dependent).
3. Used to decide if two populations with unknown distribution have the same distribution as each other. |
|
|
Term
What is the meaning of p-value?
What is statistically significant? |
|
Definition
1. p-value -the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true
2. An observed effect so large that it would rarely occur by chance
|
|
|
Term
What is a test statistic?
What is a parameter?
What is a statistic? |
|
Definition
1. Test Statistic - The sample used to determine whether a hypothesis will be accepted or rejected.
2. Parameter - A number that describes the population.
3. Statistic - A number that can be computed from the sample data without making use of any unknown parameters. |
|
|
Term
What is the name for r?
What does it measure?
How to find r using b, sy, sx? |
|
Definition
1. r is correlation coefficient
2. r measures correlation (linear dependence) between two variables X and Y.
3. r = (b)(sx)/(sy) |
|
|
Term
What is the name for r2?
How to interpret r2 in context? |
|
Definition
1. r2 isthe coefficient of determination
2. x % of the data can be explained by the least squares regression line. |
|
|
Term
What are the considerations in deciding whether the Least Squares Regression Line is an appropriate model? |
|
Definition
1. Residual Plot must have a random scattering.
2. r2 must be sufficiently large. |
|
|
Term
State Central Limit Theorem in symbols and words. |
|
Definition
Given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed. |
|
|
Term
Interpret a Confidence Interval in context |
|
Definition
I am x% confident that the true mean of the population is between # and #. |
|
|
Term
H0: The defendent is innocent.
What is a Type I error?
What is the probability of Type I error? How do you reduce it? |
|
Definition
1. H0 is true, but we reject it. In this case, the innocent defendent is found guilty.
2. Probability of a Type I error is the p-value.
3. It can be reduced by lowering the level of significance value (alpha value). |
|
|
Term
H0: The defendent is innocent.
What is a Type II error?
What is the probability of Type II error? How do you reduce it? |
|
Definition
1. H0 is false, but we fail to reject it. In this case, The defendent is actually guilty, but we fail to reject that the defendent is innocent.
2. Probability of a Type II error is 1 - Power
3. It can be reduced by increasing the sample size. |
|
|
Term
H0: The defendent is innocent.
What is the Power of the test?
How do you find power, given the probabilities of Type I and Type II errors? |
|
Definition
1. It is when H0 is false, and we do reject it. In this case, the defendent is actually guilty, and we ject that the defendent is innocent.
2. Power = 1 - Type II error |
|
|
Term
How do you increase Power? |
|
Definition
Increase the sample size. |
|
|
Term
What is the level of significance? |
|
Definition
The probability of a false rejection of the null hypothesis in a statistical test. |
|
|
Term
What is the minimum sample size needed to find a given margin of error for z confidence level? |
|
Definition
|
|
Term
What is the minimum sample size needed to find a given margin of error for proportion z confidence level? |
|
Definition
|
|
Term
What is the test statistic for Linear Regression significance test? |
|
Definition
|
|
Term
What are the characteristics of a distribution? |
|
Definition
1. Spread
2. Center
3. Gaps
4. Shape
5. Cluster |
|
|
Term
State the Empirical Rule. |
|
Definition
A statistical rule stating that for a normal distribution, almost all data will fall within three standard deviations of the mean. The empirical rule shows that 68% will fall within the first standard deviation, 95% within the first two standard deviations, and 99.7% will fall within the first three standard deviations of the mean. |
|
|
Term
State the Law of Large Numbers |
|
Definition
As the number of samples increases, the average of these samples is likely to reach the mean of the whole population. |
|
|
Term
What is the criterion for finding Outliers? |
|
Definition
Q1 - 1.5*IQR Q3 + 1.5*IQR
IQR = Q3 - Q1 |
|
|
Term
Combining independent random variables.
- Rule for combining means.
- Rule for combining standard deviations. |
|
Definition
W = X + Y
1. uw = ux + uy
uw = ux - uy
2. Ow = sqrt(Ox2 + Oy2) |
|
|
Term
State the principles of Experimental Design.
What is Matched Pairs design?
What is Randomized Block design? |
|
Definition
1. Matched Pairs involves participants being selected for one group only but part of each group are matched for some relevant factors eg) age, gender, height.
2. An experimental design in which the bt experimental units are divided into b blocks each of size t units. Within each block the t treatments under comparison are allocated at random. |
|
|
Term
State the sources of bias for sampling and surveys. |
|
Definition
1. Undercoverage
2. Nonresponse
3. Timing
4. Way the question is asked
5. Gender/age/features etc. of the person surveying (Response Bias)
6. Unthruthful answers
7. Convenience Sampling
8. Voluntary Response |
|
|
Term
What is:
1. Simple random sample
2. Stratified sample |
|
Definition
1. A subset of a statistical population in which each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group.
2. A method of sampling that involves the division of a population into smaller groups known as strata. The strata are formed based on members' shared attributes or characteristics. An SRS of each stratum is taken and then put together for the full sample.
|
|
|
Term
What are:
1. Residuals
2. Influential outliers
3. Outliers |
|
Definition
1. A difference between a value measured in a scientific experiment and the theoretical or true value.
2. Any point that has a large effect on the slope of a regression line fitting the data. They are generally extreme values.
3. A data point that diverges from an overall pattern in a sample. An outlier has a large residual. |
|
|
Term
What is confounding?
What is placebo effect?
What is blinding? |
|
Definition
1. The experimental controls do not allow the experimenter to reasonably eliminate plausible alternative explanations for an observed relationship between independent and dependent variables.(Not sure which of two reasons is the reason why something happens )
2. A positive response to a placebo, similar to that of an active substance, brought about by a person's expectations of the placebo.
3. The practice of not telling subjects whether they are receiving a placebo or the actual treatment. |
|
|
Term
What is the setting for the Binomial Distribution?
What are the requirements before using the Normal Approximation? |
|
Definition
1. -Fixed number of trials (n)
-Each trial is independent
-Each trial has exactly 2 outcomes (success orfailure)
-Probability of success (p) is the same for each trial
2. np ≥ 10
n(1-p) ≥ 10 |
|
|
Term
What is the setting for the Geometric Distribution?
What is the formula to find P(X=5)?
What is the expected mean? |
|
Definition
1. -No fixed number of trials
-Each trial is independent
-Each trial has exactly 2 outcomes (success orfailure)
-Probability of success (p) is the same for each trial
2. P(X=5) = (1-p)4(p)
3. 1/p |
|
|
Term
How to check if two events are Independent?
How to check if two events are Disjoint (mutually exclusive)? |
|
Definition
1. Two events A and B are independent if knowing that one occurs does not change the probability that the other occurs.
P(A and B) = P(A)P(B)
2. Two events A and B are disjoint (mutually exclusive) if they have no outcomes in common and so can never occur simultaneously. |
|
|
Term
Define resistant measure?
List measures which are resistant.
List measures which are not resistant. |
|
Definition
1. A measure that is not influenced by outliers.
2. Median, Inter Quartile Range
3. Mean, Range, Standard Deviation |
|
|
Term
Categorical Data
1. Formula to find expected count.
2. Compare distributions segmented or stacked bar graphs?
3. State Simpson's Paradox |
|
Definition
1. (Row total)(Column total)/(Overall total)
2. Graphs of the probability of the data or graphs of the number of each group and how many of each group does/has event A,B, etc. could be compared.
3. An association in sub-populations may be reversed in the population. It appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.
|
|
|
Term
What is:
3. Matched Pairs
4. Cluster Sampling |
|
Definition
3. Participants in different conditions are matched according to certain characteristics, e.g. age or gender 4. A random sampling plan in which the population is subdivided into groups called clusters so that there is small variability within clusters and large variability between clusters. |
|
|