# Statistical Tests in Social Research

Parametric Vs Nonparametric Statistics When statistics are calculated, using continuous variables, under the assumption that the data follow some common distribution such as the normal distribution we call these parametric statistics. Thus, when the data are normal, we can then use a host of really cool and powerful parametric statistical tests to analyze our data: t-tests, analysis of variance, linear regression, and others.

However, some distributions are clearly not normal - i.e. not revealing a bell cure. They may have two humps or it has most of its data at one end of the distribution with some of the data trailing off into a long tail. This research may also use discreet variables, such as nominal or ordinal data that does not allow us to compare means. In either case, Non-parametric tests are also called for. These tests are also knowna as distribution-free tests since they do not make the assumption that the data follows some distribution. Fortunately for almost every parametric test in the statistical toolbox, there is a corresponding non-parametric test. For example, for the independent T Test, there there is the Mann-Whitney test. The Wilcoxon Signed Rank test can be used for the paired t-test and the Kruskal-Wallis test can be used for a one-way independent group analysis of variance (ANOVA).

Why not just always use non-parametric tests? Since non-parametric tests do not make an assumption about a distribution of the data, they have less information to use to determine significance. Thus, they are less powerful than the parametric tests. That is, they have a more difficult time finding statistical significance. So why bother with them if we can avoid it!

Parametric Tests

T Tests - Comparing Two Groups

A common form of scientific experimentation is the comparison of two groups. This comparison could be of two different treatments, the comparison of a treatment to a control (between subjects), or a before and after comparison (within subjects). The preliminary results are summarized into a mean for each group. Once accomplished, you do a statistical analysis to decide if the observed differences between the two groups are real or just a chance difference.

The two most widely used statistical techniques for comparing two groups, where the measurements of the groups are normally distributed, are the Independent Group t-test and the Paired or dependent t-test. What is the difference between these two tests and when should each be used?

The Independent Group t-test is designed to compare means between two groups where there are different subjects in each group. Ideally, these subjects are randomly selected from a larger population of subjects and assigned to one of two treatments. Another way to assign subjects to two groups is to randomly assign them to one of two treatments at the time they enter a study. This randomization is often performed in a double-blind, or even triple-blind fashion (where even the statistician is in the dark concerning who is in the control group and who is in the experimental group.

Besides the normality assumption, another requirement of the Independent Group t-test is that the variances of the two groups be equal. That is, if you were to plot the observed data from each of the two groups, the resulting bell-shaped histograms would have approximately the same shape. Before actually performing the Independent Group t-test, a statistical pre-test is often performed to verify the hypothesis that the variances are equal. Options for the unequal variance case are discussed later.

Once the data are collected and the assumptions to performing the t-test are satisfied, the means of the two groups are compared. The determination of whether there is a statistically significant difference between the two means is reported as a p-value (alpha). Typically, if the p-value is below a certain level (usually 0.05), the conclusion is that there is a difference between the two group means. The lower the p-value, the greater the evidence that the two group means are different. The p-value is reported in journal articles to support a researcher's hypothesis concerning the observed outcomes for the two groups.

The other commonly used type of t-test is the Paired or dependent t-test. In this case the subjects for the two groups are the same or matched. That is, the same subjects are observed twice, often with some intervention taking place between measures. One advantage of using the same subjects is that experimental variability if less than for the independent group case. For this test the mean difference between the two repeated observations is observed and compared. If the difference is sufficiently great then there is evidence that the treatment caused some change in the observed variable. A paired t-test is performed and the observed difference between the groups is summarized in a p-value.

When not to use a T test

The benefits of performing a t-test is that it is easy to understand and generally easy to perform. However, If the variances are not equal then a variance stabilizing transformation or a modification of the t-test should be performed – usually Welch’s t-test (a t-test for unequal variances.) This version of the Independent group t-test takes into account the differences in variances and adjusts the p-value accordingly. If the data for either test are not normally distributed then a different kind of comparison test might need to be employed – a nonparametric test. In the case of Independent Groups, the nonparametric test usually performed is the Mann-Whitney test. For paired data that are not normally distributed, the Wilcoxen signed-rank test is usually performed.

Furthermore, sometime researchers make the mistake of performing multiple t-test when there are more than two groups in their research. This approach destroys the meaning of the p-value and results in erroneous conclusions about the data. Instead of multiple t-tests, there are other statistical approaches to multiple group analysis – namely the analysis of variance approach.

What follows is a simple explanation of some parametric and nonparametric tests. This is hardly exhaustive of the 100s of tests out there....

Other Parametrics

Simple Linear Regression

Definition: Used to develop an equation (a linear regression line) for predicting a value of the dependent variables given a value of the independent variable. A regression line is the line described by the equation and the regression equation is the formula for the line. The regression equation is given by:

Y = a + bX

where X is the independent variable, Y is the dependent variable, a is the intercept and b is the slope of the line. You should remember all this from grade school.

Assumptions: For a fixed value of X (the independent variable), the population of Y (the dependent variable) is normally distributed with equal variances across Xs.

Related statistics: The correlation coefficient, r, measures the strength of the association between X and Y, using a range from 1.0 to -1.0.

Graphs: Graphs produced with the simple linear regression procedure are:

1. Scatterplot with fitted regression line.

2. Residuals by the independent variable.

3. Residuals by run order.

Examination of the graphs is useful to visually verify that the relationship is linear.

Pearson's Correlation Coefficient

Definition: Measures the strength of the linear relationship between two variables. Assumptions: Both variables (often called X and Y) are interval/ratio and approximately normally distributed, and their joint distribution is bivariate normal.

Characteristics: Pearson's Correlation Coefficient is usually signified by r (rho), and can take on the values from -1.0 to 1.0. Where -1.0 is a perfect negative (inverse) correlation, 0.0 is no correlation, and 1.0 is a perfect positive correlation.

Related statistics: R2 (called the coefficient of determination or r squared) can be interpreted as the proportion of variance in Y that is contained in X.

Tests: The statistical significance of r is tested using a t-test. The hypotheses for this test are:

H0: rho = 0 Ha: rho <> 0

A low p-value for this test (less than 0.05 for example) means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis, or that there is a statistically significant relationship between the two variables.

Note: This test is equivalent to the test of no slope in the simple linear regression procedure.

Single Sample t-test

Definition: Used to compare the mean of a sample to a known number (often 0).

Assumptions: Subjects are randomly drawn from a population and the distribution of the mean being tested is normal.

Test: The hypotheses for a single sample t-test are:

Ho: u = u0

Ha: u < > u0

(where u0 denotes the hypothesized value to which you are comparing a population mean)

Test statistic: The test statistic, t, has N-1 degrees of freedom, where N is the number of observations.

Results of the t-test: If the p-value associated with the t-test is small (usually set at p < 0.05), there is evidence to reject the null hypothesis in favor of the alternative. In other words, there is evidence that the mean is significantly different than the hypothesized value. If the p-value associated with the t-test is not small (p > 0.05), there is not enough evidence to reject the null hypothesis, and you conclude that there is evidence that the mean is not different from the hypothesized value.

Independent Group t-Test

Definition: Used to compare the means of two independent groups.

Assumptions: Subjects are randomly assigned to one of two groups. The distribution of the means being compared are normal with equal variances. Test: The hypotheses for the comparison of two independent groups are:

Ho: u1 = u2 (means of the two groups are equal)

Ha: u1 u2 (means of the two group are not equal)

The test statistic for is t, with N1 + N2 - 2 degrees of freedom, where N1 and N2 are the sample sizes for groups 1 and 2. A low p-value for this test (less than 0.05 for example) means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis. Or, there is evidence that the difference in the two means are statistically significant.

Note: One sided t-tests are not as common. In this case, the alternative hypothesis is directional. For example:

Ha: u1 < u2 (the mean of group 1 is less than the mean of group 2)

When a one-sided hypothesis is used, the p-value must be adjusted accordingly.

Pre-test: Test for variance assumption: A test of the equality of variance is used to test the assumption of equal variances. The test statistic is F with N1-1 and N2-1 degrees of freedom.

1. If the p-value for this test is not small (>0.05), use the standard t-test.

2. If the p-value for this test is small, the t-test for unequal variances (Welch's test) should be used instead of the standard t-test.

Results of the t-test: If the p-value associated with the t-test is small (< 0.05), there is evidence to reject the null hypothesis in favor of the alternative. In other words, there is evidence that the means are significantly different at the significance level reported by the p-value. If the p-value associated with the t-test is not small (> 0.05), there is not enough evidence to reject the null hypothesis, and you conclude that there is evidence that the means are not different.

Paired t-test

Definition: Used to compare means on the same or related subject over time or in differing circumstances.

Assumptions: The observed data are from the same subject or from a matched subject and are drawn from a population with a normal distribution.

Characteristics: Subjects are often tested in a before-after situation (across time, with some intervention occurring such as a diet), or subjects are paired such as with twins, or with subject as alike as possible. An extension of this test is the repeated measure ANOVA.

Test: The paired t-test is actually a test that the differences between the two observations is 0. So, if D represents the difference between observations, the hypotheses are:

Ho: D = 0 (the difference between the two observations is 0)

Ha: D 0 (the difference is not 0)

The test statistic is t with n-1 degrees of freedom. If the p-value associated with t is low (< 0.05), there is evidence to reject the null hypothesis. Thus, you would have evidence that there is a difference in means across the paired observations.

Graphical Comparison: The graphical comparison allows you to see the difference in means across paired observations. However, this is not a perfect representation of the test, since the test looks at the difference between pairs of subjects.

See also: Repeated Measures Analysis of Variance. Also, if the differences have already been calculated, a single sample test of u = 0 would be equivalent to the paired t-test. The non-parametric counterpart to the paired t-test is Friedman's test.

Independent Group ANOVA

(One Way ANOVA)

Definition: An extension of the independent group t-test where you have more than two groups. Used to compare the means of more than two independent groups. This is also called a One Way Analysis of Variance.

Assumptions: Subjects are randomly assigned to one of n groups. The distribution of the means by group are normal with equal variances. Sample sizes between groups do not have to be equal, but large differences in sample sizes by group may effect the outcome of the multiple comparisons tests.

Test: The hypotheses for the comparison of independent groups are: (k is the number of groups)

Ho: u1 = u2 ... = uk (means of the all groups are equal)

Ha: ui <> uj (means of the two or more groups are not equal)

The test is performed in an Analysis of Variance (ANOVA) table. The test statistic is an F test with k-1 and N-k degrees of freedom, where N is the total number of subjects. A low p-value for this test indicates evidence to reject the null hypothesis in favor of the alternative. In other words, there is evidence that at least one pair of means are not equal.

Multiple Comparisons: Since the rejection of the null hypothesis does not specifically tell you which means are difference, a multiple comparison test is often performed following a significant finding in the One-Way ANOVA. There are a number of multiple comparison procedures in the literature. Three that are available in WINKS and KWIKSTAT are Newman-Keuls, Tukey, and Scheffé. A specialized multiple comparison, Dunnett's test, is also available. Dunnett's test is used when the comparisons are performed only with the control group versus all other groups.

Multiple comparison test are performed at a fixed significance level. Typically that level is set a 0.05.

Graphical comparison: The graphical comparison allows you to visually see the distribution of the groups. If the p-value is low, chances are there will be little overlap between the two or more groups. If the p-value is not low, there will be a fair amount of overlap between all of the groups. There are a number of options available in the comparison graph to allow you to examine the groups. These include box plots, means, medians, and error bars.

See Also: When data are not normally distributed, The Kruskal-Wallis test, a non-parametric test between groups, can be used.

This is a graphical representation of the Newman- Keuls multiple comparisons. Groups underscored by the same line are not significantly different.

In this output, the test statistic, F, is reported in the analysis of variance table. The p-value for the F(3,11) = 39.82 is < 0.001. This means that there is evidence that there are differences in the means across groups.

To determine what specific means are different, read the results of the multiple comparison table. In this case (using the Newman-Keuls procedure), differences were found between groups 3 and 1, 3 and 2, 4 and 1, 4 and 2, and 2 and 1. There was no significant difference found between groups 4 and 3. Thus, feeds 3 and 4 both produce better results than feeds 1 and 2, but there are not significantly different from one another.

The comparison graph allows you to visualize the difference between the groups.

Nonparametrics

Pearson's Chi Square

Pearson's chi-square is by far the most common.

Definition: This statistic is used to text the hypothesis of no association of columns and rows in tabular data. It can be used with nominal data. Note that chi square is more likely to find significance to the extent that (1) the relationship is strong, (2) the sample size is large, and/or (3) the number of values of the two associated variables is large. A chi-square probability of .05 or less is commonly interpreted by social scientists as justification for rejecting the null hypothesis that the row variable is unrelated (that is, only randomly related) to the column variable.

Mann-Whitney Test (Non-parametric independent two-group comparisons)

Definition: A non-parametric test (distribution-free) used to compare two independent groups of sampled data.

Assumptions: Unlike the parametric t-test, this non-parametric makes no assumptions about the distribution of the data (e.g., normality).

Characteristics: This test is an alternative to the independent group t-test, when the assumption of normality or equality of variance is not met. This, like many non-parametric tests, uses the ranks of the data rather than their raw values to calculate the statistic. Since this test does not make a distribution assumption, it is not as powerful as the t-test.

Test: The hypotheses for the comparison of two independent groups are:

Ho: The two samples come from identical populations

Ha: The two samples come from different populations

Notice that the hypothesis makes no assumptions about the distribution of the populations. These hypotheses are also sometimes written as testing the equality of the central tendency of the populations.

The test statistic for the Mann-Whitney test is U. This value is compared to a table of critical values for U based on the sample size of each group. If U exceeds the critical value for U at some significance level (usually 0.05) it means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis. (See the Zar reference for details.)

Note: Actually, there are two versions of the U statistic calculated, where U' = n1n2 - U where n1 and n2 are the sample sizes of the two groups. The largest of U or U' is compared to the critical value for the purpose of the test.

Note: For sample sizes greater than 8, a z-value can be used to approximate the significance level for the test. In this case, the calculated z is compared to the standard normal significance levels.

Note: The U test is usually perform as a two-tailed test, however some text will have tabled one-tailed significance levels for this purpose. If the sample size if large, the z-test can be used for a one-sided test.

Graphical comparison: The graphical comparison allows you to visually see the distribution of the two groups. If the p-value is low, chances are there will be little overlap between the two distributions. If the p-value is not low, there will be a fair amount of overlap between the two groups. There are a number of options available in the comparison graph to allow you to examine the two groups. These include box plots, means, medians, and error bars.

When there are more than two groups in this comparison, the test becomes a Kruskal-Wallis test.

Example for Mann-Whitney U test

Professor Testum wondered if students tended to make better scores on his test depending if the test were taken in the morning or afternoon. From a group of 19 similarly talented students, he randomly selected some to take a test in the morning and some to take it in the afternoon.

Kruskal-Wallis Test (Non-parametric independent group comparisons)

Definition: A non-parametric test (distribution-free) used to compare three or more independent groups of sampled data.

Assumptions: Unlike the parametric independent group ANOVA (one way ANOVA), this non-parametric test makes no assumptions about the distribution of the data (e.g., normality).

Characteristics: This test is an alternative to the independent group ANOVA, when the assumption of normality or equality of variance is not met. This, like many non-parametric tests, uses the ranks of the data rather than their raw values to calculate the statistic. Since this test does not make a distributional assumption, it is not as powerful as the ANOVA.

Test: The hypotheses for the comparison of two independent groups are:

Ho: The samples come from identical populations

Ha: They samples come from different populations

Notice that the hypothesis makes no assumptions about the distribution of the populations. These hypotheses are also sometimes written as testing the equality of the central tendency of the populations.

The test statistic for the Kruskal-Wallis test is H. This value is compared to a table of critical values for U based on the sample size of each group. If H exceeds the critical value for H at some significance level (usually 0.05) it means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis. (See the Zar reference for details.)

Note: When sample sizes are small in each group (< 5) and the number of groups is less than 4 a tabled value for the Kruskal-Wallis should be compared to the H statistic to determine the significance level. Otherwise, a Chi-square with k-1 (the number of groups-1) degrees of freedom can be used to approximate the significance level for the test.

Graphical comparison: The graphical comparison allows you to visually see the distribution of the two groups. If the p-value is low, chances are there will be little overlap between the distributions. If the p-value is not low, there will be a fair amount of overlap between the groups. There are a number of options available in the comparison graph to allow you to examine the groups. These include box plots, means, medians, and error bars.