Term
_____is the science of designing studies and analyzing the data that those studies produce. ____is the science of learning data. |
|
Definition
|
|
Term
_____is the entire set of subjects in which we are interested. |
|
Definition
|
|
Term
What is an example of a population? |
|
Definition
|
|
Term
A_____is a subset of the population for whom we have data. |
|
Definition
|
|
Term
What is an example of a sample? |
|
Definition
200 randomly selected voters |
|
|
Term
A_____is an entity that we measure in a study. |
|
Definition
|
|
Term
What is an example of a subject? |
|
Definition
|
|
Term
A_____is the numerical value summarizing the population data. |
|
Definition
|
|
Term
What is an example of a parameter? |
|
Definition
the proportion of voters voting for candidate A in the entire population |
|
|
Term
A_____is the numerical value summarizing the sample data. |
|
Definition
|
|
Term
What is an example of a statistic? |
|
Definition
the proportion of voters voting for candidate A in our sample (the 200 randomly selected voters) |
|
|
Term
A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. The average age of all faculty members at the college is our _____. |
|
Definition
|
|
Term
A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. The 30 randomly selected faculty members at the college is our _____. |
|
Definition
|
|
Term
A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. A single faculty member from the sample is a _____. |
|
Definition
|
|
Term
A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. All faculty members at the college is our _____ |
|
Definition
|
|
Term
A college dean is interested in learning about the average age of faculty at the college. The dean takes a random sample of 30 faculty members and averages their 30 ages. The average age of the 30 randomly selected faculty members at the college is our _____. |
|
Definition
|
|
Term
Whenever we are interested in an average for a full population, what symbol do we use? (population mean) |
|
Definition
the symbol for population mean; μ (mu) |
|
|
Term
Whenever we have an average calculated from a sample,this sample is denoted as ____. |
|
Definition
|
|
Term
"mu" represents an average calculated from _____and x bar represents an average calculated from _____. |
|
Definition
a full population; a sample |
|
|
Term
If we have the proportion for an entire population, like the proportion of voters voting for candidate A in the entire population, we use the letter ___to denote this population proportion. |
|
Definition
|
|
Term
If we have a proportion for just a sample, like the proportion of voters voting for candidate A in our sample (the 200 randomly selected voters), we use the symbol ___ to denote the sample proportion. |
|
Definition
|
|
Term
p represents a proportion calculated from _____. p hat represents a proportion calculated from ____. |
|
Definition
a full population; a sample |
|
|
Term
_____is the act of obtaining subjects from a population to participate in a certain study. |
|
Definition
|
|
Term
A_____is a sample in which every subject has some chance of being selected for the sample. |
|
Definition
|
|
Term
A _____is a sample which every subject has an equally likely chance of being selected for the sample. |
|
Definition
|
|
Term
_____is when the population is divided into non overlapping groups and a simple random sample is then obtained from each group. |
|
Definition
|
|
Term
_____is when the population is divided into non-overlapping groups and all individuals within a randomly selected group or groups are sampled. |
|
Definition
|
|
Term
_____is when you select every kth subject from the population. |
|
Definition
|
|
Term
_____is sampling where the individuals are easily obtained. |
|
Definition
|
|
Term
What is an example of convenience sampling? |
|
Definition
|
|
Term
What type of sampling is generally flawed? |
|
Definition
|
|
Term
What is the difference between stratified and cluster sampling? |
|
Definition
stratified sampling samples some individuals from all groups where cluster sampling samples all individuals from some groups |
|
|
Term
There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Pick every 10th passenger as people board the plane. |
|
Definition
|
|
Term
There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. From the boarding list, randomly choose 5 people flying first class and 25 people flying coach. |
|
Definition
|
|
Term
There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Randomly generate 30 seat numbers and survey the passengers sitting in those seats. |
|
Definition
|
|
Term
There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Select the first 30 passengers that enter plane |
|
Definition
|
|
Term
There are 300 passengers on a flight from Atlanta to Denver. We need to survey a random sample of these passengers. Name the sampling method. Randomly select several rows and survey all of the passengers sitting on those rows. |
|
Definition
|
|
Term
A_____is a characteristic or property of an individual population unit. (eg. height, weight, score on a dice) |
|
Definition
|
|
Term
All variables will be one of the following 2 types: ____or_____. |
|
Definition
categorical or quantitative |
|
|
Term
_____data classifies subjects based on some attribute or characteristic. Each observation belongs to a set of categories (car color, voting preference, etc.) |
|
Definition
|
|
Term
_____data takes on numeric values (height, weight, SAT score) |
|
Definition
|
|
Term
Any quantitative variable can be further categorized into one of the following types:_____or _____. |
|
Definition
|
|
Term
A_____variable is one where there is a countable number of distinct possible values that the variable can equal. These variables jump from one possible value to the next. |
|
Definition
discrete variables (look up a better explanation of this) |
|
|
Term
A ____variable is one where, for any two values of that variable there are an infinite number of other possible values in between. |
|
Definition
|
|
Term
Identify the following as categorical or quantitative. If quantitative, identify further as discrete or continuous. The length of time in minutes until a pain reliever begins to work. |
|
Definition
|
|
Term
Identify the following as categorical or quantitative. If quantitative, identify further as discrete or continuous. The brand of a refrigerator found in a home |
|
Definition
|
|
Term
Identify the following as categorical or quantitative. If quantitative, identify further as discrete or continuous. The number of files on a hard drive |
|
Definition
|
|
Term
A____lists the number of occurences for each category in the data. |
|
Definition
|
|
Term
Categorical data can be represented graphically using any of the following displays (3) |
|
Definition
bar graph, pareto chart, pie chart |
|
|
Term
A____is a graph constructed by putting the categories on the horizontal axis and the frequency or proportion on the vertical axis. The height of the rectangles for each category are equal to the category's frequency or proportion. |
|
Definition
|
|
Term
A ____is a bar graph whose bars are drawn in decreasing order of frequency or proportion. |
|
Definition
|
|
Term
A _____is a circle divided into sectors. Each sector represents a category of data with the size of each sector corresponding to the proportion of responses falling in that category. |
|
Definition
|
|
Term
Quantitative data can be represented graphically using what two displays? |
|
Definition
histogram or stem and leaf plot |
|
|
Term
A____is a display that looks similar to a bar graph, however it is used for quantitative data |
|
Definition
|
|
Term
What does a graph that is skewed left look like? |
|
Definition
left tail is stretched out longer than the right tail |
|
|
Term
What does the graph look like that is skewed right? |
|
Definition
right tail is stretched out longer than the left tail |
|
|
Term
Measures of the _____of a data set describe the tendency of the data to cluster, about certain numerical values. |
|
Definition
|
|
Term
Which is sensitive to extreme values in the dataset, either very large or very small numbers? the mean or median? |
|
Definition
the mean. The median is NOT sensitive to extreme values |
|
|
Term
The mean is ____to extreme values. The median is _____to exteme values. |
|
Definition
|
|
Term
If the mean is smaller than the median, then what does the graph look like? |
|
Definition
|
|
Term
if the mean is greater than the median, then what does the graph look like? |
|
Definition
|
|
Term
If the mean is equal to the median then what does the graph look like? |
|
Definition
|
|
Term
Measures of the _____are used to measure the spread, or volatility, contained in the data set. |
|
Definition
|
|
Term
What are three commonly used measures of variability? |
|
Definition
range, variance, standard deviation |
|
|
Term
What is the range? how do you find it? |
|
Definition
the range of the data set is the difference between the largest and the smallest values in the data. Range=largest value-smallest value |
|
|
Term
What does the term "deviation from the mean" mean? How do you find the "deviation from the mean"? |
|
Definition
the deviation between a value and the mean is just the distance of the value from the mean; measured by subtracting one from the other, x-x bar |
|
|
Term
The _____of a data is the average of the squared deviations from the mean, calculated using n-1 as the divisor. |
|
Definition
|
|
Term
How do you find the variance? |
|
Definition
|
|
Term
How do you find the standard deviation? |
|
Definition
standard deviation is the positive square root of the variance |
|
|
Term
What do variance and standard deviation measure? The higher the variance and standard deviation, the more______. |
|
Definition
variance and standard deviation measure how spread apart your data values are; the higher the variance and standard deviation, the more spread apart the data values will be |
|
|
Term
What is the symbol to represent population standard deviation? |
|
Definition
|
|
Term
What is the symbol for sample standard deviation? |
|
Definition
|
|
Term
What does the Empirical Rule say? |
|
Definition
68 % of all the values will lie within +1 or -1 standarad deviations from the mean; 95% of the data values will lie within +2 or -2 standard deviations from the mean; all of the values will lie within +3 or -3 standard deviations from the mean |
|
|
Term
If a value lies at the 30th percentile, then approximately _____percent are less than that value and approximately ____are higher than that value. |
|
Definition
|
|
Term
If John graduated at the 78th percentile in a class of 876, approximately how many students ranked below John? |
|
Definition
|
|
Term
|
Definition
specific percentiles that split the data into quarters |
|
|
Term
Each set of data has how many quartiles? |
|
Definition
|
|
Term
The first quartile is a value such that ____percent of the data values are smaller than Q1 and ____percent are larger. This is also known as the _____. |
|
Definition
|
|
Term
The Second Quartile (Q2) is a value such that ____percent of the data values are smaller than Q2 and and ___percent are larger. This is also known at the ____and the_____. |
|
Definition
50; 50; median; 50th percentile |
|
|
Term
The third quartile (Q3) is a value such that ____percent of the data values are smaller than Q3 and ____percent are larger. This is also known as the _____. |
|
Definition
|
|
Term
What is the best way to find quartiles in a data set? |
|
Definition
arrange the data in order; first find the median for all of the values--this will be the second quartile (Q2); then find the median for the "lower half" of the values---this will be the first quartile (Q1); then find the median for the upper half of the values---this will be the third quartile (Q3) |
|
|
Term
____are extreme observations in the data that occur often because of error in the measurement of the variable, dring data entry, or from errors in sampling. |
|
Definition
|
|
Term
How do you check for the presence of outliers in data? |
|
Definition
determine the first and third quartiles; compute the interquartile range (IQR) which is the difference between the third and first quartile (IQR=Q3-Q1); if a data value is less than Q1-1.5xIQR or greater than Q3+1.5 xIQR, it is considered an outlier |
|
|
Term
What does the five number summary represent? |
|
Definition
the five number summary represents the five values that split the data into quarters. It includes the minimum, Q1, Q2 (median), Q3, and the maximum number |
|
|
Term
A_____is a graphical representation of the five number summary. |
|
Definition
|
|
Term
What are the stephs involved in drawing a boxplot? |
|
Definition
determine Q1, Q2, Q3; draw vertical lines at Q1, the median (Q2), and Q3. Enclose these vertical lines in a box. Draw a line from Q1 to the smallest data value that is not an outlier--the minimum. Draw a line from Q3 to the largest data value that is not an outlier--maximum. Any data values that are outliers are marked with an asterisk. |
|
|
Term
For quantitative data, you can use what two types of graphs? For categorical data, you can use what three types of graphs? |
|
Definition
QUANTITATIVE-histogram and stem and leaf plot; CATEGORICAL--bar graph, pie chart, box plot |
|
|
Term
In a box plot, if the median is near the center of the box and each horizontal line is approximately equal length...what will the graph look like? |
|
Definition
|
|
Term
In a box plot, if the median is to the left of the center of the box and/or the right horizontal line is much longer than the left line, what will the graph look like? |
|
Definition
|
|
Term
In a box plot, if the median is to the right or center of the box and/or the left line is much longer than the right line, what will the graph look like? |
|
Definition
|
|
Term
A ____measures the position a value has in the data set, relative to the mean. It is measured in _____. |
|
Definition
z-score; standard deviations |
|
|
Term
What is the formula for calculating z-score? |
|
Definition
z score=value-mean/standard deviation |
|
|
Term
When calculating the z-score, what would the z score be if the value is equal to the mean? |
|
Definition
|
|
Term
Interpret a z score of -1. |
|
Definition
1 standard deviation below the mean. |
|
|
Term
An outlier is more than ___deviations from the mean. (above or below) |
|
Definition
|
|
Term
If a value has a z-score that is less than -3 or a z score greater than +3, then it is a ______. |
|
Definition
|
|
Term
If measuring height, and the z score comes out negative...are the people tall or short? |
|
Definition
|
|
Term
The _____is a variable that can be explained by, or is determined by, another variable. |
|
Definition
|
|
Term
Which variable will be the "y variable" (the variable that goes on the vertical axis when graphing data)? response or explanatory |
|
Definition
|
|
Term
The ____variable explains or affects the response variable. |
|
Definition
|
|
Term
When the two variables are quantitative, which variable will be the x variable (the variable that goes on the horizontal axis when graphing data)? response variable or explanatory variable |
|
Definition
|
|
Term
the amount you affects the how much weight you gain. what is the explanatory variable and what is the response variable? |
|
Definition
the amount you eat=explanatory variable; weight gain=response varaible |
|
|
Term
A/an _____exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. |
|
Definition
|
|
Term
If the amount we eat is small, then we probably won't see much gain in weight. However, if the amont we eat is large, then we probably will see some gain in weight. So there is a/an______between the amount eaten and weight gain. |
|
Definition
|
|
Term
A _____is a variable that is related to the response or explanatory variable (or both), but is not the variable being studied. |
|
Definition
|
|
Term
A _____would be the frequency of exercise. The amount of exercising can also affect weight gain, the response variable. |
|
Definition
|
|
Term
To explore the association between two categorical variables, we use _____. |
|
Definition
|
|
Term
What is another word for a contingency table? |
|
Definition
|
|
Term
A contingency table is a table that relates two ______. Each box inside the table is referred to as a ____. |
|
Definition
categorical variables; cell |
|
|
Term
In a contingency table, the ____will always be on the side and the ____will always be on the top. |
|
Definition
explanatory variable; response variable |
|
|
Term
A_____is the proportion for a value of a variable, given a specific value of the other variable. |
|
Definition
|
|
Term
How do you calculate the relative risk? |
|
Definition
relative risk= conditional proportion for one group/conditional proportion for another group
*When we calculate relative risk, the higher conditional proportion goes in the numerator. |
|
|
Term
what can relative risk be used for? |
|
Definition
calculating how many times more likely the outcome for one group is than the other group |
|
|
Term
What does it mean if the relative risk is close to one? |
|
Definition
it will be about the same likelihood for both groups |
|
|
Term
Before you calculate the relative risk, what must you do? |
|
Definition
make sure you have the proprotions for the numbers, you cannot just use the numbers |
|
|
Term
A _____is a graphical display for two quantitative variables. |
|
Definition
|
|
Term
On a scatter plot, what variable should be on the horizontal axis and vertical axis? |
|
Definition
horizontal axis-explanatory variable; vertical axis-response variable |
|
|
Term
Are the points on a scatter plot connected? |
|
Definition
|
|
Term
What are the three types of association? |
|
Definition
|
|
Term
A____exists between two variables if as x increases, y also increases. |
|
Definition
|
|
Term
A____exists between two variables if as x increases, y actually decreases. |
|
Definition
|
|
Term
We say there is _____between two variables if as x increases, there is no definite shift in the values of y. |
|
Definition
|
|
Term
Estimate the type of association for the following pairs of variables. Weight of a car and miles per gallon. |
|
Definition
|
|
Term
Estimate the type of association for the following pairs of variables. Speed of a car and distance required to come to a complete stop. |
|
Definition
|
|
Term
Estimate the type of association for the following pairs of variables. Weight on a bar and number of repetitions a weightlifter can achieve |
|
Definition
|
|
Term
Estimate the type of association for the following pairs of variables. The temperature outside and my grade on a test |
|
Definition
|
|
Term
If you want to figrue out if there is a linear association between two variables, you calculate the _____. |
|
Definition
|
|
Term
A ____exists wehn the data tend to follow a straigth line path. |
|
Definition
|
|
Term
If as x increases, y also increases it is a ____correlation; if as x increases, y decreases, it is a ____correlation |
|
Definition
|
|
Term
____means that as x increases there is no definite shift in the values of y. In other words, there is no linear relationship between x and y. |
|
Definition
|
|
Term
Correlation can be ____, ___, ____, ____, or _____. |
|
Definition
positive, negative, none, strong, weak |
|
|
Term
The closer the correlation is to 1 or -1, then the ____the link is between x and y. |
|
Definition
|
|
Term
The closer the correlation is to 0, then the ____ the link is between x and y. |
|
Definition
|
|
Term
What are the seven properties fo the linear correlation coeficient, r? |
|
Definition
r must always be between -1 and 1; if r is greater than 0, then there is a positive linear relationship; if r=+1 then there is a perfect positive correlation; if r is less than 0, then there is a negative linear relationship; if r is equal to -1, there is a perfect negative correlation; if r is equal to 0 then there is no linear relation between the 2 variables; a value of r close to 1 or -1 indicates a strong linear relationship while a value of r close to zero represents a weak linear relationship |
|
|
Term
Which of the following is the strongest correlation? .8, .67, -.34, 0, -.92 |
|
Definition
|
|
Term
How do you calculate r using stat crunch. |
|
Definition
stat--summary stat---correlation |
|
|
Term
The predict the response variable using the explanatory variable we create what is called a ____. |
|
Definition
|
|
Term
A ____predicts the value for the response variable (y) as a straight line function of the value of the explanatory variable (x) |
|
Definition
|
|
Term
The predicted value of y using the regression line is denoted as ____. |
|
Definition
|
|
Term
What is the equation for the regression line? |
|
Definition
|
|
Term
ŷ=a +bx....in this formula, what is teh y intercept and what is the slope? |
|
Definition
a is the y intercept and b is the slope |
|
|
Term
The ____for a value is the difference between the actual value and the predicted value of y. |
|
Definition
|
|
Term
How do you calculate the residual? |
|
Definition
residual=actual y-predicted y (y -ŷ) |
|
|
Term
How do you find the regression eqution using stat crunch? |
|
Definition
stat--regression---simple linear |
|
|
Term
What does the y intercept represent? |
|
Definition
the predicted vaue of y when x=0 |
|
|
Term
What does it mean if you have a positive residual? negative residual? |
|
Definition
you underpredicted; overpredicted |
|
|
Term
when we use our regression line to predict the costs for other properties, this is called _____. |
|
Definition
|
|
Term
We need to be careful that when we extrapolate, it is only for observations that have _____. |
|
Definition
similar x values as our data |
|
|
Term
If our values for number of carats are 0.5,0.75, 1, and 2...would it be acceptable to use our regression line to predict the price of a 10 carat diamond ring? Or a 1.5 carat diamond ring? |
|
Definition
unacceptable for 10 carat; acceptable for 1.5 |
|
|
Term
If you have a negative residual where does the point fall in relation to the line? positive residual? |
|
Definition
point falls below the line with negative residual; point falls above the line with positive residual |
|
|
Term
When interpreting the slope of a regression line...how should we do that? |
|
Definition
for every 1 unit increase in x, we predict y will change by the slope |
|
|
Term
When dealing with the regression line, how would you interpret the y intercept? |
|
Definition
wehn zero is x value, the predicted y value will be equal to the y intercept |
|
|
Term
If the height of a 15 year old male is 2.64 standard deviations below the mean, what is the corresponding z-score for that male? |
|
Definition
|
|
Term
The average 15 year old male is 68.2 inches tall, with a standard deviation of 2.8 inches. What height for a 15 year old male is 2.64 standard deviations below the mean? |
|
Definition
68.2-2.64(2.8)=60.808 inches |
|
|