Term
|
Definition
This is computed by adding all of the values of the variable in a the data set and dividing by the number of observations. Also known as the mean, or the average. |
|
|
Term
Population arithmetic mean (μ) |
|
Definition
This is computed using all of the individuals in a population. It is a parameter. The average of a population.
μ = (x1+x2+...+xN)/N = (Σxi)/N |
|
|
Term
|
Definition
This is computed using sample data. The sample mean is a statistic. Average of the sample. |
|
|
Term
|
Definition
The value that lies in the middle of the data when arranged in ascending order - represented by M |
|
|
Term
|
Definition
A numerical summary is said to be ________ if extreme values (very large or very small) relative to the data do not affect its value substantially. |
|
|
Term
|
Definition
The most frequent observation of a variable that occurs in a data set. There can be multiple. |
|
|
Term
|
Definition
When the data set has two modes |
|
|
Term
|
Definition
When the data set has 3 or more modes |
|
|
Term
|
Definition
When no observation in a data set occurs more than once. |
|
|
Term
|
Definition
The degree to which the data are spread out. Includes: the range, standard deviation, variance, and the interquartile range |
|
|
Term
|
Definition
The difference between the largest and the smallest data value. Represented by R.
R = largest data value - smallest data value |
|
|
Term
|
Definition
Population: For the ith observation, it is xi - μ Sample: For the nth observation, it is xi-(mean)x |
|
|
Term
Population Standard Deviation (σ) |
|
Definition
The ___ of a variable is the square root of the sum of squared deviations about the population mean, divided by the number of observations in the population N. That is, the square root of the mean of the squared deviations about the population mean.
[image] |
|
|
Term
|
Definition
Using this formula: 1. Create a table with four columns: enter pop. data in column 1, in column 2 enter the pop. mean. 2. Compute the deviation about the mean for each data value and enter the result in column 3. 3. In column 4, enter the squares of the values in Column 3. 4. Sum the entries in Column 4 and divide this result by ther size of the population. 5. Determine the square root of the value found in step 4.
[image] |
|
|
Term
|
Definition
A formula that is equal to the population standard deviation formula:
[image]
Using this formula: Create a table with two columns: Population data in column 1. Square each value in column 1 and enter the result in column 2. Sum the entries in column 1 and sum the entries in column 2. Substitute these values into the computational formula and simpllify.
[image] |
|
|
Term
Sample standard deviation (s) |
|
Definition
____of a variable is ther square root of the sum of squared deviations about the sample mean divided by n-1 where n is the sample mean
[image] |
|
|
Term
|
Definition
(n-1) because the first n-1 observations have the freedom to be whatever value they wish, but the nth value has no freedom. It must be whatever value forces the sum of the deviations about the mean to equal zero.
In other words, we have n-1 degrees of freedom in the computation of s because an unknown parameter, μ, is estimated with (mean)x. For each parameter estimated we lose 1 degree of freedom. |
|
|
Term
The larger the standard deviation, the more dispersion that distribution has |
|
Definition
When comparing two populations, __________, provided that the populations use the same units of measure. You want to compare apples with apples. |
|
|
Term
|
Definition
The ___ of a variable is the square of the standard deviation. |
|
|
Term
|
Definition
|
|
Term
|
Definition
|
|
Term
|
Definition
This is used to describe a statistic when it consistently under or overestimates a parameter. |
|
|
Term
|
Definition
If the data have a distribution that is bell shaped, then this rule can be used to determine the percentage of data that will lie within k standard deviations of the mean.
If the distirbution is roughly bell shaped, then: - Appx. 68% of data will lie within 1 standard deviation of the mean. Meaning appx. 68% of data will lie between μ-1σ and μ+1σ - Appx. 95% of the data will lie within 2 standard deviations of the mean, between μ-2σ and μ+2σ - Appx. 99.7% of the data will lie within 3 standard deviations of the mean, between μ-3σ and μ+3σ
This rule gives more precise results. |
|
|
Term
|
Definition
An inequality that determines a minimum percentage of observations that lie within k standard deviations of the mean, where k>1 regardless of the basic shape of the distribution (skewed left, skewed right, or symmetric).
- For any data set or distribution, at least (½ - 1/k^2) x 100% of the observations lie within k standard deviations of the mean, where k is any number greater than 1. That is, it lies between μ-kσ and μ+kσ for k>1.
- Can also be used based on sample data |
|
|
Term
|
Definition
Data that has been summarized in frequency distributions. |
|
|
Term
|
Definition
This is found by multiplying each value of the variable by its corresponding weight, adding these products, and dividing this sum by the sum of its weights. It can be expressed using the formula:
[image] |
|
|
Term
Approximate Standard Deviation of a Variable from a Frequency Distribution |
|
Definition
Population Standard Deviation - σ = √ ((Σ((xi - μ)^2)fi) / (Σfi)) Sample standrard deviation - s = √ ((Σ((xi - μ)^2)fi) / (Σfi - 1))
Where xi is the midpoint or value of the ith class, fi is the frequency of the ith class |
|
|
Term
|
Definition
Represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation. There is both population ___ and a sample ___:
[image] |
|
|
Term
|
Definition
Denoted Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value
- Percentiles divided a set of data written in ascending order into 100 parts, so 99 percentiles can be determined - Used to give the relative standing of an observation |
|
|
Term
|
Definition
Divide data sets into fourth, or four equal parts: (Q1, Q2, Q3, Q4)
Q1 - the first quartile, divides the bottom 25% from the top 75%; this is equivalent to the 25th percentile
Q2 - the second quartile, divides the bottom 50% of the data from the top 50%; equivalent to the 50th percentile or the median
Q3 - the third quartile, divides the bottom 75% of the data from the top 25%; equivalent to the 75th percentile |
|
|
Term
Interquartile Range (IQR) |
|
Definition
The range of the middle 50% of the observations in a data set. The IQR is the difference between the third and first quartiles and is found using the formula: IQR = Q3 - Q1 |
|
|
Term
Describe the distribution |
|
Definition
___means to describe a distributions shape (skewed left, right, or symmetric), its center (mean or median), and its spread (standard deviation or interquartile range). |
|
|
Term
|
Definition
Extreme observations in the data set; can occur by chance, or error.
Checking for ____: 1. Determine the first and third quartiles of the data 2. Compute the interquartile range 3. Determine the fences. 4. If a data value is less than the lower fence or greater than the upper fence, it is consdiered an outlier. |
|
|
Term
|
Definition
Serve as cutoff points for determining outliers Lower ___ = Q1 - 1.5(IQR) Upper ___ = Q3 + 1.5(IQR) |
|
|
Term
Exploratory Data Analysis |
|
Definition
Exploring the data to see if they contain interesting information that may be useful in our research; goal is to collect and present evidence NOT to make conclusions. |
|
|
Term
|
Definition
This consists of the smallest data value of a set, Q1, the median, O3, and the largest data value of the set. Organized as so: MINIMUM Q1 M Q2 MAXIMUM |
|
|
Term
|
Definition
A graph that is made using the five-number summary.
1. Determine the lower and upper fences 2. Draw a number line long enough to include the maximum and minimum values. Insert vertical lines at Q1, M, and Q3. Enclose these vertical lines in a box. 3. Label the lower and upper fences. 4. Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest data value that is smaller than the upper fence. These lines are called whiskers. 5. Any data values less than the lower fence or greater than the upper fence are outliers and are marked with an asterisk. |
|
|
Term
|
Definition
Lines on the outside of the box plot, that display the distance from the outer quartiles to the outer data values. |
|
|