Term
|
Definition
application of
statistical methods to medical and biological
problems |
|
|
Term
Objective of Biostatistics |
|
Definition
to make an
inference about a population, based on
information contained in a sample
***subject to error and/or bias |
|
|
Term
|
Definition
totality of subjects under study
***sometimes referred to as the parent
distribution*** |
|
|
Term
|
Definition
subset of the population actually
studied
***sometimes referred to as the sample
distribution*** |
|
|
Term
Bar graphs, pie charts, histograms, stemplots |
|
Definition
Displaying distributions with graphs |
|
|
Term
Mean, median, and boxplots |
|
Definition
Describing distributions with numbers |
|
|
Term
|
Definition
Exploratory data – relationships |
|
|
Term
Case Subjects/Individuals |
|
Definition
*Collect information/data from
*Can be people, animals, plants, or any object of
interest |
|
|
Term
|
Definition
*is any characteristic of an individual. A variable varies among individuals
***Example: age, height, blood pressure, ethnicity, leaf length, first language*** |
|
|
Term
Distribution (of a variable) |
|
Definition
tells us what values the variable takes and
how often it takes these values |
|
|
Term
|
Definition
Something that takes numerical values
***Example: How tall you are, your age, your blood pressure, the number of credit cards you own*** |
|
|
Term
|
Definition
Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category
***Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not*** |
|
|
Term
Ways to Chart Categorical Data |
|
Definition
Because the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.)
***Example: Bar graphs and Pie charts***
|
|
|
Term
|
Definition
*Each category is represented by a bar
*The bar’s height shows the count (or sometimes the percentage) for that particular category |
|
|
Term
|
Definition
*The slices must represent the parts of one whole
*The size of a slice depends on what percent of the whole this category represents
|
|
|
Term
Ways to Chart Quantitative Data |
|
Definition
*Histograms and stemplots
*Line graphs: time plots |
|
|
Term
|
Definition
*These are summary graphs for a single variable
*They are very useful to
understand the pattern of variability in the data |
|
|
Term
|
Definition
*Use when there is a meaningful sequence, like time
*The line connecting the points helps emphasize any change over time |
|
|
Term
|
Definition
*Range of values that a variable can take is divided into equal size intervals
*Shows the number of individual data points that fall in each interval
|
|
|
Term
|
Definition
How to make a stemplot:
1) Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit
2) Write the stems in a vertical column with the smallest value at the top, and draw a vertical line
3) Write each leaf in the row to the right of its stem, in increasing order out from the stem |
|
|
Term
|
Definition
*Stemplots are quick and dirty histograms that can easily be done by hand
*However, they are rarely found in scientific or laymen
publications |
|
|
Term
Most Common Distribution Shapes |
|
Definition
*Symmetric
*Skewed Right
*Skewed Left |
|
|
Term
|
Definition
*An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution
*Always look for outliers and try to explain them |
|
|
Term
|
Definition
If the right and left sides of the histogram are approximately mirror images of each other |
|
|
Term
Skewed to the Right Distribution |
|
Definition
If the right side of the histogram (side with larger values) extends much farther out than the left side |
|
|
Term
Skewed to the Left Distribution |
|
Definition
If the left side of the histogram extends much farther out than the right side |
|
|
Term
|
Definition
*In a time plot, time always goes on the horizontal, x axis
*We describe time series by looking for an overall pattern and for striking deviations from that pattern
*In a time series there are trends and seasonal variation |
|
|
Term
|
Definition
A rise or fall that
persists over time, despite
small irregularities |
|
|
Term
|
Definition
A pattern that repeats itself at regular intervals of time |
|
|
Term
|
Definition
|
|
Term
|
Definition
Midpoint of a distribution |
|
|
Term
|
Definition
The value in the sample that has 25% of the data at or below it |
|
|
Term
|
Definition
The value in the sample that has 75% of the data at or
below it |
|
|
Term
|
Definition
*Outliers are troublesome data points, and it is important to be able to identify them
*One way to raise the flag for a suspected outlier is to compare the distance from the suspicious data point to the nearest quartile (Q1 or Q3)
*Then compare this distance to the interquartile range (IQR)(distance between Q1 and Q3)
*“1.5 * IQR rule for outliers |
|
|
Term
|
Definition
|
|
Term
|
Definition
*"s" is used to describe the variation around the mean
*Like the mean, it is not resistant to skew or outliers
|
|
|
Term
Properties of Standard Deviation |
|
Definition
*s measures spread about the mean and should be used only when the mean is the measure of center.
*s = 0 only when all observations have the same value and there is no spread. Otherwise, s > 0.
*s is not resistant to outliers.
*s has the same units of measurement as the original observations. |
|
|
Term
|
Definition
*Variables can be recorded in different units of measurement
*Linear transformations do not change the basic shape of a distribution (skew, symmetry, multimodal)
*But they do change the measures of center and spread |
|
|
Term
|
Definition
one axis is used to represent each of the variables,
and the data are plotted as points on the graph |
|
|
Term
|
Definition
*measures or records an outcome of a study
*is the y-axis |
|
|
Term
|
Definition
*explains changes in the response variable
*is the x-axis |
|
|
Term
|
Definition
High values of one variable tend to occur together with high values of the other variable |
|
|
Term
|
Definition
high values of one variable tend to occur together with low values of the other variable |
|
|
Term
|
Definition
X and Y vary independently. Knowing X tells you nothing
about Y |
|
|
Term
Interpreting Scatterplots |
|
Definition
After plotting two variables on a scatterplot, we describe the relationship by examining the form, direction, and strength of the association. |
|
|
Term
Strength of the Association |
|
Definition
The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. |
|
|
Term
|
Definition
a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected) |
|
|
Term
|
Definition
*average the y values separately for each x value
*When a data set does not have many y values for a given x, software smoothers form an overall pattern by looking at the y values for points in the neighborhood of each x value
*Smoothers are resistant to outliers |
|
|