Term
classification of variables into different categories. serve as an ID only. ex- "what is your name" |
|
Definition
|
|
Term
when the amount of variable is placed in order of magnitude. No zero value. differences between the items can vary. |
|
Definition
|
|
Term
ordinal measurement but the differences between the scores are equal. with this there is equal differentiation between each variable. |
|
Definition
|
|
Term
measurement has a true zero point, absence of that variable. ex- "you do not have cancer" |
|
Definition
|
|
Term
measurable or countable data |
|
Definition
|
|
Term
ordinal and nominal data are... |
|
Definition
|
|
Term
ordered categories. grade of breast cancer. better, same, worse. disagree, neutral, agree |
|
Definition
|
|
Term
unordered categories. sex, alive or dead, blood group O,A,B,AB |
|
Definition
|
|
Term
the set of all possible values for the variable. |
|
Definition
|
|
Term
a subset of the population |
|
Definition
|
|
Term
a sample in which each member of the population has an equal probability of entering the sample |
|
Definition
|
|
Term
|
Definition
-necessary if inferential statistics are to be used -need to be representative -random sampling |
|
|
Term
exclusion of a subset of the population of interest prior to sampling |
|
Definition
|
|
Term
introduced when responses are not obtained from all sample members |
|
Definition
|
|
Term
inaccuracy in recorded data. can be due to survey design, interviewer impact. |
|
Definition
|
|
Term
transcription error, data corruption |
|
Definition
|
|
Term
a complete set of people, events, etc. that share a common characteristic. the scientific notation for this is "N". |
|
Definition
|
|
Term
a subset or subgroup that should be representative of the entire population. the scientific notation for this is "n". |
|
Definition
|
|
Term
the number that summarizes or describes a characteristic of a population |
|
Definition
|
|
Term
the variable that is manipulated in an experiment to determine its effect on the dependent variable. what you can control. |
|
Definition
|
|
Term
the variable that depends on the independent variable. this is often a measure of behavior or outcome. this is what is being measured. |
|
Definition
|
|
Term
the characteristic of an individual population unit |
|
Definition
|
|
Term
generalization about a population based on sample data. ex- the average age on admission is 21.9. (making an assumption about the population) |
|
Definition
|
|
Term
statement about the uncertainty associated with an inference |
|
Definition
|
|
Term
a statistic that compares a numerical description of two sets of data and the direction of the relationship. (how "a" is related to "b".) only two variables, not three. |
|
Definition
|
|
Term
a statistical method of predicting one set of values from another set of measured values. more than two variables. |
|
Definition
|
|
Term
data with absolute zero (zero means no value)ex- bank account when you are down to nothing,...no negatives allowed. |
|
Definition
|
|
Term
data with relative zero (zero has value). the difference between 0 and 1 is the same difference as 1 and 2. negatives ARE allowed. |
|
Definition
|
|
Term
three main aspects of a distribution of data... |
|
Definition
shape (look at graphs) center (what number is in the middle) spread (how much variation) |
|
|
Term
number of data points in a class |
|
Definition
|
|
Term
class frequency/ n= - m & m example= green/ total number of m & ms |
|
Definition
|
|
Term
class relative frequency x 100 |
|
Definition
|
|
Term
|
Definition
10-20. general rule of thumb= there should be no fewer than 6 intervals, and no more than 15. |
|
|
Term
this is found by dividing the difference between the largest and smalles score by the number of class intervals (w=R/k) (size of intervals/ over how many intervals we want) |
|
Definition
|
|
Term
a way to show what your data looks like. a graphic display. your "stem" is the midpoint. |
|
Definition
|
|
Term
a distribution that when folded in half, produces two identical shapes. |
|
Definition
|
|
Term
a distribution in which scores are clustered at one end, and rarity of scores occur on the other end. |
|
Definition
|
|
Term
if the tail (rare scores) occurs for the high scores to the right. |
|
Definition
|
|
Term
if the tail (rare scores) occurs for the low scores to the left |
|
Definition
|
|
Term
tendency of data to center about certain numerical values. mean, median and mode. |
|
Definition
|
|
Term
the score that occurs the most often (can be unimodal, bimodal, or multimodal) data displayed in a histogram will have a modal class- the class with the largest frequency |
|
Definition
|
|
Term
the score with an equal amount of scores above and below it (50th percentile)middle number when observations are arranged in order. not always the number in the middle |
|
Definition
|
|
Term
the sum of scores divided by the number of scores (average) |
|
Definition
|
|
Term
the spread of the data across possible values. |
|
Definition
|
|
Term
largest measurement minus the smallest measurement. this loses sensitivity when data sets are large. |
|
Definition
|
|
Term
may provide a better way of describing the variety that exists among the values in a data set. involves measuring dispersion relative to the scatter of values in a data set about their mean. |
|
Definition
|
|
Term
provides a measure of dispersion in original units by simply taking the square root of the variance. useful as a measure of variation within a given set of data |
|
Definition
|
|
Term
|
Definition
|
|
Term
the square root of the sample variance |
|
Definition
sample standard deviation |
|
|
Term
the square root of the population variance |
|
Definition
population standard deviation |
|
|
Term
|
Definition
|
|
Term
descriptive measures of relationship of a measurement to the rest of the data |
|
Definition
common measures: percentile ranking or percentile score z-score |
|
|
Term
the distance between a measurement x and the mean, expressed in standard units. this allows comparison across data sets. |
|
Definition
|
|
Term
an observation that is unusually large or small relative to the data values being described |
|
Definition
|
|
Term
you will be able to tell the 25%, 50%, and 75% percentile in this. AKA "box and whiskers plot" |
|
Definition
|
|
Term
|
Definition
take number at 75% percentile - number at 25% percentile= 50% percentile number. (slide 52 on presentation 2) |
|
|
Term
the range of values for the middle 50% of the scores in a distribution |
|
Definition
|
|
Term
anything + or - 3 outside of deviations |
|
Definition
|
|
Term
correlation that moves up and to the right... correlation that moves down and to the right... (on scattergram) |
|
Definition
-positive coorelation -negative coorelation |
|
|
Term
look up pareto diagram picture |
|
Definition
|
|
Term
everytime you see this value, someone has tested a hypothesis. |
|
Definition
|
|
Term
by chance the group's results were different... |
|
Definition
|
|
Term
the two treatment arms really had different results |
|
Definition
|
|
Term
the two groups were chosen poorly and are different for reasons that have nothing to do with the treatment |
|
Definition
|
|
Term
in research, you are trying to reject this, then you have a relationship. "there is no difference between the two study groups" statistical testing is always done as an exercise to disprove this hypothesis. |
|
Definition
|
|
Term
There is a difference in this hypothesis. |
|
Definition
|
|
Term
there is a difference but not saying which direction |
|
Definition
two tailed alternate hypothesis |
|
|
Term
|
Definition
|
|
Term
if p-value is LESS than alpha.. look at slide 21 in hypothesis testing lecture! |
|
Definition
statistically significant if p=.10 and alpha=.05= NOT SIGNIFICANT |
|
|
Term
|
Definition
+/-1= 68% +/-2= 95% +/-3= 99.7% |
|
|
Term
if p value is equal to alpha, is it statistically significant? |
|
Definition
yes, as long as it is the same or smaller than alpha, it is statistically significant |
|
|
Term
when we falsely reject the null hypothesis, we have committed a __error. |
|
Definition
|
|
Term
when we falsely fail to reject the null hypothesis, we have committed a __error. |
|
Definition
|
|
Term
significant tests for data form interval or ratio scales. they are more powerful than nonparametric tests. they are preferred if their assumptions are met. |
|
Definition
|
|
Term
used to test hypotheses with nominal and ordinal data |
|
Definition
|
|
Term
an advantage for nonparametric tests... |
|
Definition
appropriate for non-normal population distributions |
|
|
Term
in this you know the whole population. |
|
Definition
|
|
Term
in this you have to use probability. has more tail area than in a normal distribution. |
|
Definition
|
|
Term
this test is used when you are testing for differences between samples. it is the measure of the differences between actual and expected frequencies. it is a test of association between two categorical variables. used with either nominal or ordinal measurements. |
|
Definition
|
|
Term
estimates the standard deviation of the difference between the measured values and the true values. "how close is the sample mean to the population mean?" |
|
Definition
|
|
Term
the population or true mean |
|
Definition
|
|
Term
the average of your sample |
|
Definition
|
|
Term
|
Definition
|
|
Term
n-1 (number of values used to calculate SD or SE) |
|
Definition
|
|
Term
the theorem that the distribution of sample means taken from a large population approaches a normal (Gaussian) curve. tells us how much error to expect in our sample estimates. |
|
Definition
|
|
Term
you are comparing the MEANS of two groups. use this because you can't use a Z. very common.used when the standard deviation of the population is NOT known |
|
Definition
|
|
Term
this is a measure of the difference between your data and what you expect to see, in units of standard error. |
|
Definition
|
|
Term
Testing the means of 3 or more groups of continuous, normally distributed data to see if they are all equal to one another...for this we would use an entirely different test called.... |
|
Definition
analysis of variance, AKA ANOVA -You use F stat |
|
|
Term
if you have categorial data, you will use this... - has cells |
|
Definition
|
|
Term
standard error is a predictor of what? |
|
Definition
|
|
Term
do you want a high or low chi square statistic? |
|
Definition
you want high= more significant |
|
|
Term
calculation of chi-square degrees of freedom. ex- 5x3 contingency table would have__df. |
|
Definition
(number of rows-1) multiplied by the (number of columns - 1) answer= 8 as df rises, so does probability. |
|
|
Term
when you have 2 proportions that are NOT equal, you will have this.. |
|
Definition
two-sided test (two tailed) |
|
|
Term
if your p value is .09, which is greater than .05 alpha... |
|
Definition
it is NOT significant -you CANNOT reject the null |
|
|
Term
-used to analyze multi-level designs -method for testing hypotheses about differences between 3 or more sample means. |
|
Definition
analysis of variance, ANOVA |
|
|
Term
ANOVA overcomes this problem... |
|
Definition
enables the researcher to detect significant differences between the treatments as WHOLE. All treatments have similar variance. |
|
|
Term
-factors are independent variables. (A is not related to B is not related to C) -Variables are categorical and must define separate gropus |
|
Definition
|
|
Term
ANOVA test provides an___statistic which is used to calculate the p-value. |
|
Definition
|
|
Term
ANOVA is a combination of __ tests. |
|
Definition
|
|
Term
When you have one outcome and you are trying to explain it. you plug all of these variables in, and it will give you predicted value. The value of one variable can be PREDICTED if the value of the other variable is known. starts at the top, and moves down variable by variable. |
|
Definition
|
|
Term
Your predictor variable/ explanatory variable |
|
Definition
independent (turkey is independent variable- turkey leads to fatigue) |
|
|
Term
|
Definition
dependent (fatigue= dependent when you eat turkey you will experience fatigue) |
|
|
Term
the magnitude of the slope of the line represents... |
|
Definition
the amount that the dependent variable changes for each unit changed in the independent variable |
|
|
Term
the magnitude of the slope is represented by this value |
|
Definition
|
|
Term
range for correlation coefficient... |
|
Definition
-1 to 1. values of 0-0.25 indicates little or no relationship between two variables ->0.75 reflect a strong relationship |
|
|
Term
refers to generalizability of the results obtained from the study |
|
Definition
|
|
Term
refers to METHODOLOGY utilized in the study. can either overestimate or underestimate the true association between exposure and outcome. |
|
Definition
|
|
Term
three general categories of threats to internal validity.. |
|
Definition
|
|
Term
"any systematic error in design, conduct or analysis of a study that results in a mistaken estimate of an exposure's effect on the risk of disease. |
|
Definition
|
|
Term
misclassifying exposure or outcome status. there is not a set scale (ex- vigorous exercise is interpretted differently among different people) SUBJECTIVE |
|
Definition
|
|
Term
Classifying some of the subjects as having had an MI when they did not... |
|
Definition
outcome misclassification |
|
|
Term
misclassification that occurs in the same proportion in each group being studied... |
|
Definition
random or nondifferential misclassification (DO NOT MATCH NON-RANDOM AND NONDIFFERENTIAL,....NO NON NON) |
|
|
Term
misclassification that occurs in different proportions in each group |
|
Definition
non-random or differential misclassification (DO NOT MATCH NON-RANDOM AND NONDIFFERENTIAL..NO NON, NON) |
|
|
Term
Always results in an underestimation of the true association, you have just accepted the null |
|
Definition
|
|
Term
misclassification can either overestimate or underestimate the true association, depending on the situation. |
|
Definition
non-random or differential misclassification |
|
|
Term
occurs when one group is followed more closely than the other group...non-random misclassification (unproportionally giving more attention to one group verses another) |
|
Definition
|
|
Term
a potentially relevant exposure may be remembered by a "case" and be forgotton by a "control". (ex- people who were actually diagnosed with melanoma have more emotional involvement)NON-RANDOM |
|
Definition
|
|
Term
-they lie/ do not remember -subjects may not be willing to report an exposure accurately. RANDOM(everybody lies) |
|
Definition
|
|
Term
data collection methods differ between groups. results in a non-random misclassification. in a case control study, one interviewer may ask more probing questions |
|
Definition
|
|
Term
the way in which cases and controls, or exposed and non-exposed individuals are selected is such that an apparent association is observed even when in reality, the exposure and the disease is not associated |
|
Definition
|
|
Term
this occurs when 2 conditions are met. -there must be an association between a variable (third factor) and the exposure status -the variable must also be an independent risk factor for the outcome |
|
Definition
|
|
Term
does NOT invalidate study results. -need to account for it -if not identified, may result in a missed opportunity to not differences in outcomes between subgroups in a study. answers the question "is the relationship between the exposure and the outcome the same or different across various strata or subgroups, within the study? |
|
Definition
|
|
Term
|
Definition
the aim is to control confounding and elimate its effects -effect modification is to be described and reported |
|
|
Term
the magnitude or direction of an association varies according to levels of a third factor. |
|
Definition
effect measure modification AKA "effect modification" and "interaction" unlike confounding, effect measure modification should be described and reported, rather than controlled |
|
|