Term
What are the steps in hypothesis testing? |
|
Definition
1) Formally state your null (H0) and research or alternative (H1) hypotheses 2) Select an appropriate test statistic and the sampling distribution of that test statistic 3) Select a level of significance (alpha level) and determine the critical value and rejection region of the test statistic based on the selected level of alpha 4) Conduct the test: Calculate the obtained value of the test statistic and compare it to the critical value 5) Make a decision about your null hypothesis and interpret this decision in a meaningful way based on the research question, sample, and population |
|
|
Term
What is the standard error of the mean? |
|
Definition
The standard deviation for the distribution of sample means |
|
|
Term
How does one translate the sample mean into a z score when the population standard deviation is not known? |
|
Definition
z = (X-bar - µ) ÷ (s ÷ √n) |
|
|
Term
What is the critical region? |
|
Definition
The area of the sampling distribution that contains all unlikely or improbable sample outcomes and that would cause one to reject the null hypothesis |
|
|
Term
Directional hypothesis tests are referred to as "_____-tailed" statistical tests, and nondirectional hypothesis tests as "_____-tailed" |
|
Definition
|
|
Term
What is the formula used to conduct a z test for proportions? |
|
Definition
z = (p-hat - p) ÷ (sigma sub p-hat)
Where: sigma sub p-hat = √p(q) ÷ n p = the population proportion assumed under the null hypothesis p-hat = the sample proportion q = 1- p |
|
|
Term
When is it appropriate to use a t test for hypothesis testing instead of a z test? |
|
Definition
The z test and z distribution may be used for making one-sample hypothesis tests involving a population mean under two conditions: if the population standard deviation is known and if the sample size is large enough (≥100) so that the sample standard deviation (s) can be used as an unbiased estimate of the population standard deviation |
|
|
Term
We are interested in the average dollar amount lost by victims by burglary. The National Insurance Association has reported that the mean dollar amount lost by victims of burglary is $2,222. Assume that this is the population mean. We believe that the true population mean loss is different from this. Formally state the null and research hypotheses we would test to investigate this question. What if we believed the dollar amount to be higher? |
|
Definition
H0: µ = $2,222 H1: µ ≠ $2,222
If we believed the amount was higher, the hypotheses would be H0: µ = $2,222 H1: µ > $2,222 |
|
|
Term
What is a chi-square goodness of fit test? |
|
Definition
A one or two variable test that indicates if there is a relationship between categorical variables |
|
|
Term
Can the chi-square test of independence indicate the strength of a relationship between two variables? |
|
Definition
|
|
Term
What is the formula for the chi-square goodness of fit test? |
|
Definition
x^2 = (∑-number of categories) x (ƒ-of-observed - ƒ-of-expected)^2 ÷ ƒ-of-observed
In words, subtract the expected frequency from the observed frequency, square that difference, and then divide by the expected frequency. Perform this for all of the categories and then sum those calculations. This will be the obtained value of the chi-square statistic |
|
|
Term
How does one find the degrees of freedom with the chi-square statistic? |
|
Definition
k - 1
The number of groups minus one |
|
|
Term
In the chi-square test of independence, what is the observed frequency? |
|
Definition
The number of instances actually measured as shown in the sample data |
|
|
Term
How does one find the expected frequencies needed for the chi-square test? |
|
Definition
By determining what we should see if the null hypothesis is true |
|
|
Term
The chi-square test is appropriate for what levels of data? |
|
Definition
|
|
Term
What is a joint frequency distribution? |
|
Definition
The simultaneous occurrence of one event from the first variable and another event from the second variable (in other words, the intersection of the two events). |
|
|
Term
What is a contingency table? |
|
Definition
A table that shows the joint distribution of two categorical variables, where one variable designates the columns and the other designates the rows |
|
|
Term
In describing the dimensions of a contingency table, a 3 x 2 table means that there are ___ columns and ___ rows |
|
Definition
2; 3
(Think of it as an R x C table) |
|
|
Term
Row marginals refer to what? What do column marginals refer to? |
|
Definition
The number of cases in each row of the table; the frequency in each column of the table |
|
|
Term
To what does relative risk in a contingency table refer? |
|
Definition
The chances of landing in a particular cell in the table |
|
|
Term
What is the difference in using the chi-square goodness of fit and test of independence? |
|
Definition
The independence test looks at the cell frequencies in a contingency table. In other words, for a test of independence you would take the difference between the observed and expected cell frequency, square the difference, and divide that by the expected cell frequency |
|
|
Term
How do you find the expected cell frequency for a chi-square test of independence? |
|
Definition
Multiply the row marginal frequency for the given row of interest times the column marginal for the column of interest divided by the number of cases
ƒexpected = (RM x CM) ÷ n |
|
|
Term
How does one determine the number of degrees of freedom for a chi-square test of independence? |
|
Definition
Degrees of freedom = (# of rows -1) x (# of columns -1) |
|
|
Term
What is are measures of association? |
|
Definition
Statistics that inform us about the strength or magnitude as well as the direction of the relationship between two variables |
|
|
Term
Define the formula for the phi-coefficient and what level of data for which it is appropriate. What is the range of the phi-coefficient and what do those numbers indicate? |
|
Definition
phi = √(chi-square ÷ n)
Nominal level data
0 to 1; 0 means no relationship and 1 means perfect relationship |
|
|
Term
Lambda is known as a proportionate reduction in error (PRE) measure of association. What does this mean? |
|
Definition
It allows one to tell exactly how much better one will be able to predict one variable from knowledge of another. It requires that the independent variable is known from the dependent |
|
|
Term
What is the computational formula for lambda? |
|
Definition
lambda = ((∑ƒi) - ƒd) ÷ n - ƒd
Where ƒi = largest cell frequency in EACH category of the independent variable ƒd = largest marginal frequency of the dependent variable |
|
|
Term
The phi and lambda coefficients are both only appropriate for nominal level data. What is appropriate for ordinal level? |
|
Definition
Goodman and Kruskal's Gamma |
|
|
Term
What is the general formula for gamma? |
|
Definition
gamma = (CP - DP) ÷ (CP + DP)
Where CP = number of concordant pairs of observations DP = number of discordant pairs of observations |
|
|
Term
How does one determine if a pair is concordant? |
|
Definition
When the scores on the two variables are consistently higher or consistently lower for two pairs of observations |
|
|
Term
To determine the number of discordant pairs in a table, you... |
|
Definition
Start in the lower leftmost cell that is low on the column variable but high on the row variable. Multiply this cell frequency by the sum of the cell frequencies for all cells that are both above and to the right of that cell |
|
|
Term
How do you calculate the number of concordant pairs in a contingency table? |
|
Definition
Start in the top leftmost cell and multiply this cell frequency by the sum of all cell frequencies that are both below and to the right of this cell |
|
|
Term
What are the two explanations for a difference between sample means for two populations? |
|
Definition
1- There really is a difference between between the groups 2- The difference is due to sampling error |
|
|
Term
What is the sampling distribution of sample mean differences? |
|
Definition
The theoretical distribution of the difference between an infinite number of sample means |
|
|
Term
What is the standard error of the difference between two means? |
|
Definition
The standard deviation of the sampling distribution of the difference between two means |
|
|
Term
What is the equation for the standard error of the difference between two means? |
|
Definition
sigma sub x-bar1 - x-bar2 = √(sigma1^2 ÷ n1) + (sigma2^2 ÷ n2)
Where sigma 1 = standard deviation of the first population sigma 2 = standard deviation of the second population |
|
|
Term
What is an independent random sample? |
|
Definition
When samples are drawn whose elements are randomly and independently selected |
|
|
Term
What is a pool variance estimate? |
|
Definition
The estimation of the standard error of the difference of two unknown population standard deviations when we assume the standard deviations are equal |
|
|
Term
What does the matched-groups t test test? |
|
Definition
The difference between the scores for each pair of samples |
|
|
Term
Explain the difference between independent and dependent variables. If you think that low self-control affects crime, which is the independent and which is the dependent variable? |
|
Definition
An independent variable is the variable whose effect or influence on the dependent variable is what you want to measure. In causal terms, the independent variable is the cause, and the dependent variable is the effect. Low self-control is taken to affect one's involvement in crime, so self-control is the independent variable and involvement in crime is the dependent variable. |
|
|
Term
When is it appropriate to use an independent-samples t test and when is it appropriate for a t test for dependent samples or matched groups? |
|
Definition
An independent-samples t test should be used whenever the two samples have been selected independently of one another. In an independent samples t test, the sample elements are not related to one another. In a dependent-samples or matched-groups t test, by contrast, the sample elements are not independent but are instead related to one another. An example of dependent samples occurs when the same sample elements or persons are measured at two different points in time, as in a "before and after" experiment. A second common type of dependent sample is a matched-groups design. |
|
|
Term
What is an analysis of variance (aka ANOVA)? |
|
Definition
A tool that can conduct multiple tests of population means while maintaining a true alpha level. |
|
|
Term
The sums of squares will follow a __________ distribution with k - 1 degrees of freedom. |
|
Definition
|
|
Term
The expected frequency in the chi square test is a so-called "_________" factor: It turns the above frequency into a proportion. That way, the whole thing behaves like a __ score. |
|
Definition
|
|
Term
What is the general form of a t statistic? |
|
Definition
t = (statistic - mean of sampling distribution) / standard error |
|
|
Term
What is the formula for variance? |
|
Definition
S^2 = (∑(x - xbar)^2) ÷ (n - 1) |
|
|
Term
The F-test is the _____ of the variance. What is it's formula? |
|
Definition
Ratio; F = (sigma1^2) ÷ (sigma2^2) |
|
|
Term
When calculating the different kinds of variability, the following scores and means apply: -Total variability: the difference between an _______ score and the _____ mean -Within-group: the difference between an _______ score and the _____ mean -Between-group: the difference between the _____ mean and the ____ mean |
|
Definition
-Total: individual; grand -Within-group: individual; group -Between-group: group; grand |
|
|
Term
What is the formula for the total sum of squares? |
|
Definition
SS-tot: ∑i ∑k (x-indiv - x-bar-grand)^2 |
|
|
Term
What is the formula for the within group sum of squares? |
|
Definition
SSwithin = ∑i ∑k (x-indiv - x-bar-group)^2 |
|
|
Term
What is the formula for the between group sum of squares? |
|
Definition
SSbetween = ∑i ∑k (x-bar-group - x-bar-grand)^2 |
|
|
Term
When calculating the degrees of freedom in the sums of squares, what are the formulas for the three types? |
|
Definition
Total: n - 1 Within: n - k Between: k - 1 |
|
|
Term
To find variance with the sum of squares and degrees of freedom, we divide what by what? |
|
Definition
The sum of squares for whichever type (total, within, between) by the degrees of freedom for that type |
|
|
Term
What is the formula for the F test? |
|
Definition
F = (SS-between ÷ df-between) ÷ (SS-within ÷ df-within) |
|
|
Term
Tukey's Honest Significant Difference test requires a calculate the critical difference score. What is the formula to do this? |
|
Definition
CD = q√(within-group variance ÷ n-sub-k)
Where n-sub-k = number of cases in each of the k groups q = studentized range statistic |
|
|
Term
For Tukey's HSD test, you need to find q. What three things do you need to do this? |
|
Definition
1) Alpha level 2) Degrees of freedom within groups 3) Number of groups |
|
|
Term
Tukey's HSD test doesn't look at one hypothesis, but tests ____ ____ of sample means. |
|
Definition
|
|
Term
What is the formula for eta squared (aka correlation ratio)? |
|
Definition
eta^2 = SS-between ÷ SS-total |
|
|
Term
What is the formula for the q in Tukey's HSD? |
|
Definition
q = range ÷ standard deviation of the sample |
|
|
Term
The t test does not easily generalize to more than ___ groups |
|
Definition
|
|
Term
F distributions converge to a ____ ________ distribution as the denominator df go to positive infinity. |
|
Definition
|
|
Term
When is it appropriate to perform an analysis of variance with our data? What type of variables do we need? |
|
Definition
An analysis of variance can be performed whenever we have a continuous (interval or ratio level) dependent variable and a categorical variable with three or more levels or categories, and we are interested in testing hypothesis about the equality of our population means |
|
|
Term
What statistical technique should we use if we have continuous dependent variable and a categorical independent variable with only two categories? |
|
Definition
If we have a continuous dependent variable and a categorical independent variable with only two categories or levels, the correct statistical test is a two-sample t test, assuming that the hypothesis test involves the equality of two population means |
|
|
Term
Why do we call this statistical technique an analysis of variance when we are really interested in the difference among population means? |
|
Definition
It is called the analysis of variance because we make inferences about the differences among population means based on a comparison of the variance that exists within each sample, relative to the variance that exists between the samples. More specifically, we examine the ratio of variance between the samples to the variance within the samples. The greater this ratio, the more between-samples variance there is relative to within-sample variance. Therefore, as this ratio becomes greater than 1, we are more inclined to believe that the samples were drawn from different populations with different population means. |
|
|
Term
What two types of variance do we use to calculate the F ratio? |
|
Definition
Between-group variance divided by within-group variance |
|
|