Term
|
Definition
THE ROW OF DATA TABLE CORRESPOND TO THE INDIVIDUAL CASES ABOUT WHOM WE RECORD SOME CHARACTERISTICS. |
|
|
Term
|
Definition
USUALLY SHOWN AS THE COLUMN WHERE WE RECORD THE CHARACTERISTICS ABOUT EACH INDIVIDUAL |
|
|
Term
|
Definition
WHEN A VARIABLE NAMES CATEGORIES AND ANSWERS QUESTIONS ABOUT HOW CASES FALL INTO THOSE CATEGORIES. |
|
|
Term
|
Definition
WHEN THE MEASURED VARIABLE WITH UNITS ANSWERS QUESTIONS ABOUT THE QUANTITY OF WHAT IS MEASURED |
|
|
Term
|
Definition
TELLS WHO WASW MEASURED, WHAT WAS MEASURED, HOW THE DATA WERE COLLECTED, WHERE THE DATA WERE COLLECTED, AND WHEN AND WHY THE STUDY WAS PERFORMED. |
|
|
Term
|
Definition
ALL THE CASES WE WISH TO KNOW ABOUT. |
|
|
Term
|
Definition
LISTS THE CATEGORIES IN A CATEGORICAL VARIABLE AND GIVES THE COUNT OR PERCENTAGE OF OBSERVATIONS FOR EACH CATEGORY. |
|
|
Term
|
Definition
GIVES THE POSSIBLE VALUES OF THE VARIABLE AND THE RELATIVE FREQUENCY OF EACH VALUE |
|
|
Term
|
Definition
IN A STATISTICAL DISPLAY, EACH DATA VALUE SHOULD BE REPRESENTED BY THE SAME AMOUNT OF AREA. |
|
|
Term
|
Definition
SHOWS A BAR WHOSE AREA REPRESENTS THE COUNT (OR %) OF OBSERVATIONS FOR EACH CATEGORY OF A CATEGORICAL VARIABLE. |
|
|
Term
|
Definition
SHOWS HOW A "WHOLE" DIVIDES INTO CATEGORIES BY SHOWING A WEDGE OF A CIRCLE WHOSE ARE CORRESPONDS TO THE PROPORTION IN EACH CATEGORY. |
|
|
Term
|
Definition
DISPLAYS COUNTS AND, SOMETIMES, PERCENTAGES OF INDIVIDUALS FALLING INTO NAMED CATEGORIES ON TWO OR MORE VARIABLES. THE TABLE CATEGORIZES THE INDIVIDUALS ON ALL VARIABLES AT ONCE, TO REVEAL POSSIBLE PATTERNS IN ONE VARIABLE THAT MAY BE CONTINGENT ON THE CATEGORY OF THE OTHER. |
|
|
Term
|
Definition
VARIABLE ARE SAID TO BE INDEPENDENT IF THE CONDITIONAL DISTRIBUTION OF ONE VARIABLE IS THE SAME FOR EACH CATEGORY OF THE OTHER. WE WILL SHOW HOW TO CHECK FOR INDEPENDENCE IN A LATER CHAPTER. |
|
|
Term
|
Definition
DISPLAYS THE CONDITIONAL DISTRIBUTION OF A CATEGORICAL VARIABLE WITHIN EACH CATEGORY OF ANOTHER VARIABLE. |
|
|
Term
|
Definition
WHEN THE AVERAGES ARE TAKEN ACROSS DIFFERENT GROUPS, THEY CAN APPEAR TO CONTRADICT THE OVERALL AVERAGES. THIS IS KNOWN AS "SIMPSON'S PARADOX". |
|
|
Term
|
Definition
OF A QUANTITATIVE VARIABLE SLICES UP ALL THE POSSIBLE VALUES OF THE VARIABLE INTO EQUAL WIDTH BINS AND GIVES THE NUMBER OF VALUES (OR COUNTS) FALLING INTO EACH BIN. |
|
|
Term
|
Definition
TO DESCRIBE THE SHAPE OF A DISTRIBUTION, LOOK FOR: SINGLE VS. MULTIMEDIA MODES, SYMMETRY VS. SKEWNESS, OUTLIERS AND GAPS. |
|
|
Term
|
Definition
THE PLACE IN THE DISTRIBUTION OF A VARIABLE THAT YOU'D POINT TO IF YOU WANTED TO ATTEMPT THE IMPOSSIBLE BY SUMMARIZING THE ENTIRE DISTRIBUTION WITH A SINGLE NUMBER. MEASURES OF CENTER INCLUDE THE MEAN AND MEDIAN. |
|
|
Term
|
Definition
A NUMERICAL SUMMARY OF HOW TIGHTLY THE VALUES ARE CLUSTERED AROUND THE CENTER. MEASURES OF SPREAD INCLUDE THE IQR AND STANDARD DEVIATION. |
|
|
Term
|
Definition
A HUMP OR LOCAL HIGH POINT IN THE SHAPE OF THE DISTRIBUTION |
|
|
Term
|
Definition
HAVING ONE MODE. DESCRIBES THE SHAPE OF A HISTOGRAM WHEN IT IS GENERALLY MOUND SHAPED. DISTRIBUTION WITH TWO MODES ARE CALLED BIMODAL. THOS WITH MORE THAN TWO ARE MULTIMODAL. |
|
|
Term
|
Definition
A DISTRIBUTION THAT IS ROUGHLY FLAT IS SAID TO BE UNIFORM. |
|
|
Term
|
Definition
IF THE TWO HALVES OF A DISTRIBUTION LOOK APPROXIMATELY LIKE MIRROR IMAGES OF EACH OTHER. |
|
|
Term
|
Definition
THE PART OF A DISTRIBUTION THAT TYPICALLY TRAIL OFF ON EITHER SIDE. DISTRIBUTIONS CAN BE CHARACTERIZED AS HAVING LONG TAILS (IF THEY STRAGGLE OFF FOR SOME DISTANCE) OR SHORT TAILS (IF THEY DON'T). |
|
|
Term
|
Definition
IF IT IS NOT SYMMETRIC AND ONE TAIL STRETCHES OUT FARTHER THAN THE OTHER. DISTRIBUTIONS ARE SAID TO BE SKEWED LEFT WHEN THE THE LONGER TAIL STRETCHES TO THE LEFT, AND SKEWED RIGHT WHEN IT GOES TO THE RIGHT. |
|
|
Term
|
Definition
EXTREME VALUES THAT DO NOT APPEAR TO BELONG WITH THE REST OF THE DATA. THEY MAY BE UNUSUAL VALUES THAT DESERVE FURTHER INVESTIGATION, OR THEY MAY BE JUST MISTAKES; THERE IS NO OBVIOUS WAY TO KNOW. |
|
|
Term
|
Definition
THE MIDDLE VALUE, WITH HALF OF THE DATA ABOVE AND HALF BELOW IT. IF n IS EVEN, IT IS THE AVERAGE OF THE TWO MIDDLE VALUES. IT IS USUALLY PAIRED WITH IQR. |
|
|
Term
|
Definition
THE DIFFERENCE BETWEEN THE LOWEST AND HIGHEST VALUES IN A DATA SET. RANGE=MAX-MIN. |
|
|
Term
|
Definition
THE LOWER QUARTILE (Q1) IS THE VALUE WITH A QUARTER OF THE DATA BELOW IT. THE UPPER QUARTILE (Q3) HAS THREE QUARTERS OF THE DATA BELOW IT. THE MEDIAN AND QUARTILES DIVIDE DATA INTO FOUR PARTS WITH EQUAL NUMBERS OF DATA VALUES. |
|
|
Term
INTERQUARTILE RANGE (IQR) |
|
Definition
THE IQR IS THE DIFFERENCE BETWEEN THE FIRST AND THIRD QUARTILES. IQR = Q3 - Q1. IT IS USUALLY REPORTED ALONG WITH THE MEDIAN. |
|
|
Term
|
Definition
THE iTH PERCENTILE IS THE NUMBER THAT FALLS ABOVE i% OF THE DATA. |
|
|
Term
|
Definition
SUMMARY OF A DISTRIBUTION REPORTS THE MINIMUM VALUE, Q1, THE MEDIAN, Q3, AND THE MAXIMUM VALUE. |
|
|
Term
|
Definition
THE MEAN IS FOUND BY SUMMING ALL THE DATA VALUES AND DIVIDING BY THE COUNT:
IT IS USUALLY PAIRED WITH THE STANDARD DEVIATION. |
|
|
Term
|
Definition
A CALCULATED SUMMARY IS SAID TO BE RESISTANT IF TIT IS AFFECTED ONLY A LIMITED AMOUNT BY OUTLIERS. |
|
|
Term
|
Definition
THE SUM OF SQUARED DEVIATIONS FROM THE MEAN, DIVIDED BY THE COUNT MINUS 1:
IT IS USEFUL IN CALCULATIONS LATER IN THE BOOK |
|
|
Term
|
Definition
THE SQUARE ROOT OF THE VARIANCE:
IT IS USUALLY REPORTED ALONG WITHE THE MEAN. |
|
|
Term
|
Definition
IF A POINT IS MORE THAN 3.0 IQR FROM EITHER END OF THE BOX IN A BOXPLOT, IT IS NOMINATED AS A FAR OUTLIER. |
|
|
Term
|
Definition
WHEN COMPARING DISTRIBUTIONS OF SEVERAL GROUPS CONSIDER THEIR: SHAPE, CENTER AND SPREAD |
|
|
Term
|
Definition
1. COMPARE THE SHAPES, ARE THEY SYMMETRIC OR SKEWED, ARE THERE DIFFERENCES BETWEEN THE GROUPS. 2.MEDIANS: WHICH GROUP HAS A HIGHER CENTER, ARE THERE ANY PATTERS 3. IQRS: WHICH GROUP IS MORE SPREAD OUT? ANY PATTERNS IN HOW THEY CHANGE. |
|
|
Term
|
Definition
DISPLAYS DATA THAT CHANGE OVER TIME. OFTEN, SUCCESSIVE VALUES ARE CONNECTED WITH LINES TO SHOW TRENDS MORE CLEARLY. SOMETIMES A SMOOTH CURVE IS ADDED TO THE PLOT TO HELP SHOW LONGTERM PATTERNS AND TRENDS. |
|
|
Term
|
Definition
USED TO ELIMINATE UNITS. STANDARDIZED VALUES CAN BE COMPARED AND COMBINED EVEN IF THE ORIGINAL VARIABLES HAD DIFFERENT UNITS AND MAGNITUDES |
|
|
Term
|
Definition
A VALUE FOUND BY SUBTRACTING THE MEAN AND DIVIDING BY THE STANDARD DEVIATION |
|
|
Term
|
Definition
ADDING A CONSTANT TO EACH DATA VALUE ADDS THE SAME CONSTANT TO THE MEAN, THE MEDIAN, AND THE QUARTILES, BUT DOES NOT CHANGE THE STANDARD DEVIATION OR IQR |
|
|
Term
|
Definition
MULTIPLYING EACH DEATA VALUE BY A CONSTANT MULTIPLIES BOTH THE MEASURES OF POSITION (MEAN, MEDIAN, AND QUARTILES) AND THE MEASURES OF SPREAD (STANDARD DEVIATION AND IQR) BY THE CONSTANT. |
|
|
Term
|
Definition
A USEFUL FAMILY OF MOKELS FOR UNIMODAL, SYMMETRIC DISTRIBUTIONS |
|
|
Term
|
Definition
A NUMERICALLY VALUED ATTRIBUTE OF A MODEL. FOR EXAMPLE, THE VALUES OF (MU) AND (SIGMA) IN A N(MU OR SIGMA) MODEL ARE PARAMETERS. |
|
|
Term
|
Definition
TELLS HOW MANY STANDARD DEVIATIONS A VALUE IS FROM THE MEAN; Z-SCORES HAVE A MEAN OF 0 AND A STANDARD DEVIATION OR 1. WHEN WORKING WITH DATA, USE THE STATISTICS AND S:
WHEN WORKING WITH MODELS, USE THE PARAMETERS OR : |
|
|
Term
|
Definition
IN A NORMAL MODEL, ABOUT 68% OF VALUES FALL WITHIN 1 STANDARD DEVIATION OF THE MEAN, ABOUT 95% FALL WITHIN 2 STANDARD DEVIATIONS OF THE MEAN, AND ABOUT 99.7% FALL WITHIN 3 STANDARD DEVIATIONS OF THE MEAN. |
|
|
Term
|
Definition
A NORMAL MODEL, N( )WITH MEAN = 0 AND STANDARD DEVIATION = 1. ALSO CALLED THE STANDARD NORMAL DISTRIBUTION |
|
|
Term
|
Definition
IF IT IS UNIMODAL AND SYMMETRIC. WE CAN CHECK BY LOOKING AT A HISTOGRAM OR A NORMAL PROBABILITY PLOT. |
|
|
Term
|
Definition
A NORMAL MODEL, N( )WITH MEAN = 0 AND STANDARD DEVIATION = 1. ALSO CALLED THE STANDARD NORMAL DISTRIBUTION |
|
|
Term
|
Definition
IF IT IS UNIMODAL AND SYMMETRIC. WE CAN CHECK BY LOOKING AT A HISTOGRAM OR A NORMAL PROBABILITY PLOT. |
|
|
Term
|
Definition
THE NORMAL PERCENTILE CORRESPONDING TO A Z-SCORE GIVES THE PERCENTAGE OF VALUES IN A STANDARD NORMAL DISTRIBUTION FOUND AT THAT Z-SCORE OR BELOW. |
|
|
Term
|
Definition
A DISPLAY TO HELP ASSESS WHETHER A DISTRIBUTION OF DATA IS APPROXIMATELY NORMAL. IF THE PLOT IS NEARLY STRAIGHT, THE DATA SATISFY THE NEARLY NORMAL CONDITION |
|
|