# Correlational Research

### From Scientificmetho

While psychologists seek to use experimental methods where possible, correlational research represents the true heart of psychology. Whereas the strict scientific experimenter is interested in controlling for individual differences in his search for general laws, psychologists are mainly interested in studying the those very individual differences - which places much of psychology out of the realm of causal experimental research. So be it.

A particular interest is in the Person X Environment factorial design illustrated in the previous section. This represents the current "Bio-psycho-social" outlook of psychology.

## Correlation and Regression Predictions

When examining a correlation, we use a coefficient of correlation known as a Pearson R to measure and express the strength of a correlation between two phenomena. This measure can only be used, however, if the dependent variable is measured with interval or ratio data. Spearman’s rho is used to measure correlations with ordinal data. A simple Scatterplot can also be used to illustrate a correlation.

**Predictions for special groups**

An important point to remember when doing a correlational study is not to unduly restrict the range of respondents. To do so, and remove extreme respondents weakens the correlation. An example would be an Ivy league school that removes all SAT scores below 1250 when making predictions of academic success. Data that once told a linear story (i.e. people with over 1250 probably did a heck of a lot better than students with 900s) now tells a more murky tale. Procedures exist for correcting correlations to account for restriction problems, but one must be aware that restricting the range has direct effects on the ability to make predictions.

**Correlations and the Assumption of "Linearity"**

A major problem with these correlation measures is that they assume linearity - i.e. that the data will align itself in a neat line, either going diagonally up in a positive correlation, or diagonally down in a negative correlation. But, not all correlated phenomena are linear. Some phenomena have a curvilinear nature. The old boring answer you see everywhere is the correlation between fear (or "arousal" level) and test taking. Too little fear will not motivate a student to study. Too much fear may effect his ability to study. A modicum of fear motivates just right.

A cuter example would be a prediction of Goldilock's predilection to eat porridge. Her consumption level would dramatically decrease as porridge either became colder or hotter. However, as it came closer to "just right" her consumption level would go through the roof. Tell your friends that in order to predict Goldilock's prediliction-consumption rates for porridge, that you would need to plot the data on a curvilinear scatterplot and then run a regression analysis.

Oh, and you could use the same methods to predict what bed she will be in.

**Regression Analysis**

If you know that two variables are correlated, then knowing the score of one of the variables allows you to predict, with some probability better than randomness, what the other variable will be. Making predictions on the basis of correlational research is referred to as Regression Analysis. Regression analysis is most frequently used in using psychological tests to make predictions. Regression analysis is behind such statements as "Smoking increases your chances of cancer" and "wherever you go, there you are."

Scatter plots use regression lines to illustrate a mean or summarization of the points on the plot. The formula for this line is the line for a slope: Y = a +bX. The Y is referred to as the criterion variable, and the X is the predictor variable.

The strength of predictions with a regression analysis depend on the strength of the correlation. A Pearson R correlation runs from -1.0 to 0 to +1.0. The higher the number, the stronger the correlation. The closer to 0, the weaker. As you get closer to a +1.0 correlation, you get a stronger positive correlation - the more one thing shows up, the more likely the second thing is there. Think of this as a correlation between intelligence and anti-religious viewpoints. As you get closer to -1.0, you get a stronger negative correlation - the more one things shows up, the less the other does. Think of this as a correlation between George Bush opening his mouth and the confidence level of the American people in their president. The closer you get to zero, the closer you get to a random relationship between the variables, which is to say no relationship. Think of this as the current state of your marriage.

'*Interpreting Correlations *
'
To make it brief: Correlation does not imply Causality! Since correlational methodology cannot hold all other variables constant as the independent variable is introduced, we can not make causal statements, no matter how strong the correlation. This is the reasoning behind Cigarette companies claiming that "there is no proof that smoking causes cancer." On a purely scientific analysis, this statement is still correct.

But to put this statement in the proper light, to say that there is no proof that smoking causes cancer is akin to saying that there is no proof that tomorrow will come, or as stated earlier in this page, there is no way to be certain that a rock tossed into the air will be pulled back towards the earth. These statements are true, but in all of these cases you would be pretty ridiculous to claim that this means we can assume that they are untrue, which is what cigarette companies purposely imply when they then continue to sell cigarettes.

**Running into trouble with Making causal statements about correlational data**

When you try to make causal statements about correlational research, you run into the problems of directionality and the Third Variable problem. The directionality confound refers to the fact that while the correlation seems to say that A causes B, (A -> B), it could be that B causes A: (B -> A). A third variable may also be behind both events: C-> A+B.

In some situations, we may have a reason to suspect that a third variable is operating. If so, and if it is possible to measure this third variable, its effects can be evaluated using a procedure called partial correlation which attempts to control for third variables statistically. In effect, it is a post facto attempt to create semi-equivalent groups. For example, suppose you know that the correlation between reading speed and reading comprehension is high: +.55. This seems to indicate that the faster you read, the more you will learn. So, you decide to read your homework as quick as possible from now on.

But, suppose further that you believe things can’t be that nice and easy. You start to think that maybe IQ is a third variable behind something like reading comprehension. To uncover this truth, you would do a partial correlation. To complete a partial correlation, you would correlate a) IQ and reading speed, b) IQ and reading comprehension. Let’s suppose they come out to be +.70 and +.72 respectively, and this seems to confirm your suspicions. Calculating a partial correlation involves incorporating all three of these correlations. What results is a partial correlation that measures the relationship between reading speed and reading comprehension with IQ partialed out or controlled. When this occurs, we are left with a Pearson R of +.10, which, considering the sample size of our theoretical groups, is not significant. Performing this test helps us avoid making a mistake concerning the influence of reading speed on reading comprehension, and we end up trying out some other way of avoiding our homework...

**Cross Lagged Panel Correlation**

Jesus Christ - does psychological terminology get full of itself, or what? A "cross lagged panel correlation" helps us increase our confidence regarding the directionality issues of correlational research. This procedure investigates correlations between variables at several points in time. Hence, it is also a type of longitudal design.

This design helps correct the directionality error because it picks up one aspect of a causal relationship that simple correlations do not. Remember that causal relationships occur when we find that phenomenon A and B always occur together, that A precedes B in time, and that stating that A causing B is both parsimonious and consistent with some theory, and that other explanations can be ruled out. The cross lagged method allows us to infer some things about A preceding B, i.e., A and B are repeatedly seen paired consistently in contiguity.

**Structural Modeling**

This procedure tries to reduce the confounds of both third variables and directionality. What occurs is that linear structural equations for each of the possibilities is tracked - i.e. A->B, B->A and the most likely third variable, C -> A+B. As you might imagine, this makes the study more complex and expensive.

As another reminder, recall that single-factor, nonequivalent group designs, factorial nonequivalent designs, and P x E designs which all use subject variables suffer from the same interpretation concerns.

**The Need for Correlations**

Since correlational studies cannot lead to causal explanations, they are often seen as inferior to true experimental research. However, correlational designs are required for both practical and ethical reasons.

We already know about the practical reasons - we cannot randomly assign people to subject variables such as gender. Ethical reasons also forbid us from using experimental techniques. For example, in order to do a true experiment on traumatic brain injury rehabilitative techniques, we would need to randomly assign healthy people to have their brains damaged.

**Multivariate Correlations**

A bivariate approach investigates the relationship between two variables. A Multivariate analysis considers the relationship of more than two variables. Just think how annoying this would be to remember if it were the other way around.

**Multiple Regression**

Multiple regression occurs whenever a group of men get together for a sporting event. It is also a predictive measure where there is a criterion and at least two predictor variables. As in regression analysis, a multiple regression can be defined as follows: Y= A+Bx1 + B2X2 +... BnXn, where each X is a predictor variable, the Y is the criterion, and each B is a Beta Weight giving the relative weighting to each predictor. As may be expected, a multiple regression is a superior predictor to a bivariate regression, but is more complex to perform.

'*Factor Analysis *
'
A second and widely used multivariate technique is called a factor analysis. In this procedure, a large number of variables are measured and correlated with each other. A correlation matrix may then be used to illustrated which groups of variables cluster together to form factors. A good example is the factors of verbal fluency and spatial skills tests that are clustered in the Weschler Adult Intelligence Scale.

This analysis also uncovers factor loadings which are correlations between each of the test and each of the identified factors.