Term
|
Definition
The derivation of general ideas from specific observations - not used by scientists
Scientists will use hypothesising and experimentation |
|
|
Term
What is Hypothetico-deductive reasoning? |
|
Definition
Observations lead to plausible hypotheses, which we then attempt to falsify, if we cannot prove them false, they are good hypotheses, but not necessarily right |
|
|
Term
|
Definition
A general set of ideas or rules used to explain a group of observations |
|
|
Term
|
Definition
|
|
Term
What is a Paradigm shift? |
|
Definition
A change in the way we think about a subject |
|
|
Term
What is a Null Hypothesis? |
|
Definition
H0, The form of a hypothesis that we formally test, it predicts nothing will happen |
|
|
Term
What is an Alternative hypothesis? |
|
Definition
H1, A specific prediction about an experiment |
|
|
Term
|
Definition
Data in categories with names |
|
|
Term
|
Definition
Data that always rises in integers
Is treated as non-parametric |
|
|
Term
|
Definition
Non-quantitative ranked data, normally used in questionnaires
Is treated as non-parametric but can be transformed |
|
|
Term
|
Definition
Quantitative measurements on a continuous scale
Treated as parametric |
|
|
Term
What are Descriptive statistics? |
|
Definition
Measures calculated from a data set which summarise some characteristics of the data |
|
|
Term
Measures of central tendancy |
|
Definition
|
|
Term
|
Definition
A graph showing the total number of quantitative observations in each of a series of numerically ordered categories |
|
|
Term
|
Definition
|
|
Term
|
Definition
Total of all the squared deviates in a data set, squaring removes the minus, SS shows the magnitude of the variability but not the direction |
|
|
Term
|
Definition
s2- the average size of the squared deviates in a sample - an estimate of the population variance |
|
|
Term
|
Definition
s - the average size of deviates in a data set. |
|
|
Term
|
Definition
All individuals in a group |
|
|
Term
|
Definition
A sub-set of a population, meant to represent it |
|
|
Term
|
Definition
Bell-shaped, Gaussian, 68.5 of all data points are in one SD |
|
|
Term
Standard error of the mean |
|
Definition
A measure of the confidence we have in our sample mean as an estimate of the real mean |
|
|
Term
|
Definition
If skewed to the right, there is a long tail to the right, atc. for left |
|
|
Term
|
Definition
Tests which make many assumptions |
|
|
Term
|
Definition
Tests which make fewer assumptions |
|
|
Term
|
Definition
A distribution where a maximum possible count is far above the mean, resulting in a skew |
|
|
Term
|
Definition
A distribution where the maximum count is close to the mean |
|
|
Term
|
Definition
Used for visualising differences |
|
|
Term
|
Definition
Used for visualising trends |
|
|
Term
|
Definition
A measurement is not precise ifthere is an unbiased measurement error |
|
|
Term
|
Definition
A measurement is accurate if it is free from bias, bias occurs when there is a systematic error in your measurements |
|
|
Term
|
Definition
A confounding effect is something that influences your results in a way that can be confused with the effect you are studying |
|
|
Term
|
Definition
Effects of a variable are only visible once above a certain point |
|
|
Term
|
Definition
Effects of a variable are only visible below a certain point |
|
|
Term
Independent samples t-test |
|
Definition
A statistical test designed to test for a difference between the means of two samples of continuous data |
|
|
Term
|
Definition
The rejection of the null hypothesis when it is true |
|
|
Term
|
Definition
The failure to reject the null hypothesis when it is false |
|
|
Term
|
Definition
the use of non-independant data pointsw as if the were independant |
|
|
Term
|
Definition
A test designed were samples are not independant of each other, normally used to examine change |
|
|
Term
|
Definition
If the variance is homogenous, it is the same in each sample |
|
|
Term
|
Definition
A test which is used to examine differences between observed and expected counts |
|
|
Term
Pearsons correlation coefficient |
|
Definition
The statistic used to test the significance of correlations between two variables. Can only be used with linear relationships and normal distributions |
|
|
Term
Spearmans rank correlation coefficient |
|
Definition
Non-parametric correlation test |
|
|
Term
|
Definition
Tests the null hypothesis that the samples means are not different |
|
|
Term
|
Definition
Non-parametric one way ANOVA |
|
|
Term
|
Definition
Combines anova and regression |
|
|
Term
|
Definition
Clear, Precise, Plausible, Able to produce testable predictions |
|
|
Term
|
Definition
|
|
Term
|
Definition
|
|
Term
|
Definition
i=n Thesumof:(Xi-Xwithalineoverit)2quared i=n |
|
|
Term
|
Definition
i=n s2=thesumof(xiXwithalineoverit)2quared i=1 _____________________________________ n-1 |
|
|
Term
|
Definition
|
|
Term
95% of samples are within |
|
Definition
|
|
Term
Standard error of mean formula |
|
Definition
|
|
Term
Which tests have more statistical power? |
|
Definition
|
|
Term
|
Definition
|
|
Term
Parametric test standard assumptions |
|
Definition
Independance Homogenity of variance |
|
|
Term
Alternative t test if the variances are not the same |
|
Definition
|
|
Term
To test if the variances are the same? |
|
Definition
|
|
Term
If data is not normal, you can transform it by... |
|
Definition
Squaring all the points, eliminating a right skew square root arcsine all the points, eliminating a left skew |
|
|
Term
Alternative T-test if the data is not normal |
|
Definition
|
|
Term
|
Definition
adjust for chance of a type 1 error |
|
|
Term
In regression, we analyse |
|
Definition
the affect of a variable on another variable |
|
|
Term
How do you work out the Probability of two independent events occurring? |
|
Definition
Multiply both of the probability of each event occurring by each other |
|
|
Term
How do you work out the probability of two non-independent events occurring? |
|
Definition
Times the probability of one event occurring by the probability of the other event occurs IF the first event occurs.
P(A and B) = P(A).(B|A) |
|
|
Term
What is the difference between sampling with or without replacement? |
|
Definition
This is the idea that the probability of taking a sample from a group will differ depending on which sample you are taking if you are to replacing the sample back into the group after removing.
This makes it very important for statistical conditional probability |
|
|
Term
|
Definition
This is when you take a sample from a population but you may take from a sample that is not the true population, may be to do with biological difference in the species or something that means you will get biasly skewed data? |
|
|
Term
What is a Bonferroni correction? |
|
Definition
This is when you are doing multiple parallel studies which will often derive a Type I error due to the nature of the multiple testing and therefore you perform Bonferroni correction.
You will divide the significance threshold (e.g. 0.05) by the number of tests that were independently performed.
This will lead to a lower significance value and much lower chance of getting a type I error |
|
|
Term
How can you correct a type II error? |
|
Definition
You can perform more repetitions or take ore samples as it is just to do with statistical power. That means with more statistical power you are able to reject or accept the null hypothesis with statistical confidence |
|
|
Term
|
Definition
This is when you will have too many variables and assumptions in a statistical test. This means that random statistical noise that would usually be insignificant, is assumed to be significant because of all the statistical variables |
|
|
Term
What is the binomial probability distribution of certain number of events (i) in a certain (n) number of trials where (p) is the probability of a certain event outcome in a singular trial? |
|
Definition
[image]
This explained is the probability of seeing a certain number of 1 outcome (i) in a certain number of total trials (n) and the probability of the certain outcome we a re looking at (p) |
|
|
Term
What does the poisson distribution equation look like? |
|
Definition
[image]
This is where (i) is the the number of events we are working out the probability of seeing and (m) is the mean number of times the event we are looking at occurs |
|
|
Term
When do you use the binomial distribution probability test and when do you use poisson? |
|
Definition
The binomial should be used when there is a fixed number of trials in the experiment. Poisson should be used if it is open ended |
|
|
Term
When do you do a one tailed test or a two-tailed test? |
|
Definition
One tailed is when you expect it the data to trend in a certain direction away from the average.
Two tailed is when the data may go either way and you are not sure which and then you will perform this. Do this also by halving the significant value at both ends. |
|
|
Term
How do you perform a chi squared test for independence? |
|
Definition
Take all these values first [image] and then take them away from the observed values and square the difference.
And then divide the expected values by the the squared deviations and then this will give you a probability which may or may not be below 0.05
If it is below 0.05 the difference is significant and the null hypothesis of non independent data is rejected |
|
|
Term
How to work out the degrees of freedom when there are multiple rows and columns? |
|
Definition
(R-1)(C-1) Is the way to work that out |
|
|
Term
When testing data's significance using a chi squared non-independence test, what is the result? |
|
Definition
When the final value is below 0.05, the value is therefore significant however it only tells you that the data is independent. No real significance, it just means independence |
|
|
Term
What test can be used to test for significant difference between two different sets of data that are not normal? |
|
Definition
Wilcoxon two sample test. Null hypothesis that the data are not statistical different and the differences are statistical random. |
|
|
Term
When is a binomial distribution normal? |
|
Definition
When p=0.5 for the events occurring and the curve is symmetrical |
|
|
Term
How is the bell of a normally distributed curve defined? |
|
Definition
The height is the mean and the standard deviation will explain the width of the curve |
|
|
Term
How do you test for Normality of data? |
|
Definition
Shapiro Wilk Test for the normality of data. Test statistic W and p value. If the value for p is above the 0.05 value, then the data is normal. |
|
|
Term
What is the difference between parameters and statistics? |
|
Definition
Parameters are ASSUMPTIONS made about a population and statistics are the KNOWN results from the sample you have taken from the population.
For a normal distribution, a population will have a mean of μ and standard deviation σ, while a sample has a mean of x and a standard deviation of s. |
|
|
Term
What is the equation for Standard deviation? |
|
Definition
|
|
Term
How do you work out how different you statistical data is from the assumed population reality? |
|
Definition
With the mean, you are able to work this out by calculating the Standard error of the mean.
This is the (standard deviation)÷(square root of the sample size) |
|
|
Term
What is the central limit theorem? |
|
Definition
This is the idea that when looking at a normally distributed curve, 95% of all the data will fall in the region of +/- 1.96x standard deviation |
|
|
Term
What is meant by the 95% confidence interval of a mean? |
|
Definition
This is the range of data that will fall in-between the mean ±1.96sd |
|
|
Term
What is the sample size limit for similarities in population and sample sd to be assumed? |
|
Definition
If a sample size is above 30, standard deviation can be assumed to similar enough between the population and the sample from the population and therefore can assume the 95% confidence interval of a mean value (±1.96sd) If sample is below 30, instead of the ±1.96sd, we use a value called the t value. This is derived by taking the degrees of freedom (n-1) and then looking on a t-distribution table and looking down the p=0.975 column. |
|
|
Term
What is the basic normal distribution test for difference between two means? |
|
Definition
[image]
Often having to double the final result for the Z test statistic as it is often a two tailed test. |
|
|
Term
In the anova test, what is meant by treatment effect and residual effect? |
|
Definition
The residual effect is how much the the individual sample will differ from the group mean and the treatment effect is how much the group mean will differ from the grand mean |
|
|
Term
What are the two sets of DF in the anova test? |
|
Definition
One is the df of the groups used (groups-1) The other df is the total number of samples - the number of groups |
|
|
Term
What is the significance of the F value in the the ANOVA test? |
|
Definition
F threshold is based upon the two values of the degrees of freedom. The Specific F value is the Treatment mean squared deviate÷Residual means squared deviate
If the specific value is higher than the threshold then the difference is significant |
|
|
Term
What is the non-parametric equivalent of the ANOVA test? |
|
Definition
Kruskal-Wallis Test The one-way analysis of the variance of sets of independent data with equal or different sample sizes. Used if not normal data or unequal variances. The data can be either one of those or both for this to be used. Test statistic is χ2 and then the among and within group squared deviates and then the p value to know if the value is significant. |
|
|
Term
What is a two-way ANOVA used for? |
|
Definition
You are checking if there is a significant relationship between two or more factors on a certain test variable
[image] |
|
|
Term
What is the most common form of transformation and why would you do it? |
|
Definition
Take the Log10 values of the non normal data as this may then give distributions of normal data. Do this to be able to perform parametric tests as they have much more statistical power |
|
|
Term
What is the difference between a t test and a paired t test? |
|
Definition
Paired t test has more statistical power. Normal t test will just compare the difference of means of the two groups. Paired t test will compare the difference between the mean difference in group values and 0 |
|
|
Term
What is the no-parametric version of the paired t test? |
|
Definition
Wilcoxon signed rank test Test statistic = V |
|
|
Term
What is the equation for working out correlation, r? |
|
Definition
[image]
Gives the correlation coefficient
This will be Pearson's rank coefficient It is a different equation for the Spearman's rank |
|
|
Term
What is the coefficient of determination? |
|
Definition
This is the correlation value squared. It represents the percentage of the variance in one variable is explained by the variance in the other variable |
|
|
Term
How do you test for significance of the correlation coefficient? |
|
Definition
Work out the standard error of the correlation: [image]
And then divide the correlation coefficient by the standard error of the correlation. If this value is larger than the corresponding T-value that matches your df for 0.975 as it is two tailed correlation significance testing, then the correlation is significant.
On r this will be given as a p value and the null hypothesis is that the correlation is not significant |
|
|
Term
How do you work out the slope for regression? |
|
Definition
The angle is just: [image]
And then the intercept of x=0 is where the slop is on the graph |
|
|
Term
How do you test for significant regression? |
|
Definition
Base it upon results of correlation. If correlation is significant, so it the regression and vice versa |
|
|
Term
What does epsilon, ε, show? |
|
Definition
In linear models, this will always incur some error in the model. The error is the same no matter what the other values are |
|
|
Term
What is linear model of regression? |
|
Definition
|
|
Term
What is the linear model of the t test? |
|
Definition
|
|
Term
What is the linear model of the ANOVA test? |
|
Definition
|
|
Term
What is the line model for a two way ANOVA? |
|
Definition
|
|
Term
How do you assess the fit of the model? |
|
Definition
the mean of the squared deviations of the actual values of y from the predictions of the model
The further away the data values and the model values are, the worse fir the model |
|
|
Term
How do you reduce overfitting? |
|
Definition
You produce a minimum adequate model. This will be the linear model with the least number of variables in it. Only include the variables that really make a difference otherwise it will disrupt your read. Ignore minimal effect variables |
|
|
Term
What does the + mean in a linear model? |
|
Definition
It just means that in the model, that variable is included. Does not mean mathematical addition |
|
|
Term
How does adding more variables to a linear model effect the value of the sum of squared deviates? |
|
Definition
More variables will ALWAYS increase the sum of squared deviates
Therefore if the difference is not signifiant, the minimum adequate model should be picked over the model with more variables |
|
|
Term
What is the logistic function equation and graph for models that will have an upper and lower maximum? |
|
Definition
|
|
Term
|
Definition
This is when you are taking a η value and you predict a y value by putting the η value into a link function. This is used in generalised models and looks like this: |
|
|
Term
|
Definition
Continuous normal data
If it is count data, it is not parametric and therefore cannot do parametric test |
|
|
Term
What form should linear regression lines be in? |
|
Definition
|
|
Term
How can you quickly tell the difference between binomially and poisson distributed data? |
|
Definition
The poisson distributed data will have expectation of massively high values compared to the real data. Binomial expected values will be pretty close |
|
|
Term
What is the difference between the Mann-Whitney and Wilcoxon? |
|
Definition
Both are non parametric versions of the t test.
The Mann-Whitney is used for independent data usually.
Also I believe Wilcoxon can be for paired and Mann Whitney can be used for single t test |
|
|
Term
How do you find outliers in R? |
|
Definition
Plot the data on a Cleveland plot |
|
|
Term
How do you find homogeneity of variance errors in R? |
|
Definition
Plot the data on conditional box plot |
|
|
Term
How do you find errors of normality in your data on R? |
|
Definition
Plot the data in a histogram |
|
|
Term
How do you find errors of too many zeros in your data in R? |
|
Definition
Plot data into a Frequency histogram |
|
|
Term
How do you find errors in interactions of data in R? |
|
Definition
Plot data into a conditional plot |
|
|
Term
What is a t test for normally distributed data but have unequal variances? |
|
Definition
|
|
Term
How do you work out the F value in an ANOVA test? |
|
Definition
Divided the treatment mean square by the residual mean square |
|
|
Term
What are the three important R commands you may need? |
|
Definition
str = data columns head = first few dat aline dim = size of data matrix |
|
|
Term
How should a visual basic excel file be saved? |
|
Definition
|
|
Term
How should most excel files be saved? |
|
Definition
.xlsx, .csv or .txt
If it is a visual file then .xlsm |
|
|
Term
How do you fix a cell in an excel formula when dragging copying the cell formula? |
|
Definition
Use a $ sign in front of the cell you are fixing in the formula |
|
|
Term
|
Definition
This is the relationship between the value of y in a linear model and a value η, which represents some or all of the variables in the linear model.
The link function is just the relationship between the two and will help predict values for y with increasing or decreasing values for x in the model.
Will create the S shaped asymptote that will never meet x=0 or x=1 |
|
|