Scientific research often involves correlation which generally refers to association or any form of link or correspondence. Correlation is nothing but a statistical approach used to evaluate the linear association between two continuous variables. While several types of statistical tests can be deployed to determine the relationship between two quantitative variables, Pearson’s correlation coefficient is considered as the most reliable test used to measure the continuous variables.
Pearson’s correlation coefficient is a statistical test that measures the association or relationship between two continuous variables. Pearson’s correlation coefficient is represented by ‘rho’ and is based on the method of covariance.
The correlation coefficient or ‘rho’ is the slope of regression line between two variables when they have been standardised by subtracting their means & dividing by their standard deviations.
Like any other statistical test, Pearson’s correlation coefficient test has a few assumptions including:
- There is a linear relationship between the variables in the sample which can be assessed by plotting the value of variables on a scatter diagram.
- The variables must be normally distributed
- The variables are either internal or ratio measurements
- The outliers must be eliminated completely or kept at minimum
- There exists homoscedasticity of data
Properties of the Pearson’s test
Limit – Coefficient values range from -1 to +1, where -1 indicates perfect negative relationship and +1 indicates perfect positive relationship and zero indicates absence of relationship.
Symmetric – Correlation of coefficient between two variables is symmetric which simply means that coefficient value between X and Y or Y and X are symmetric.
Pure number – Pure number is independent of unit of measurement. For instance, if the unit of one variable is in cm and the other is in inches, the Pearson’s correlation coefficient value will remain the same.
When ‘rho’ is utilised as a descriptive statistics, the test doesn’t require any distributional assumptions. When the hypotheses are tested, it is assumed that the observations are independent and variables are distributed on the basis of bi-variate- normal density function.
Testing the hypothesis
Null hypothesis – H0: ρ = 0
Here the population correlation coefficient doesn’t differ significantly from zero. This simply means that there is no significant linear relationship between X and Y.
Alternate hypothesis – Ha: ρ ≠ 0
Here the population correlation coefficient significantly differs from zero. In other words, there is a significant linear relationship between X and Y.
Method 1 – using the P-value to make a decision
- If the p-value is less than the significance level, i.e., alpha= 0.05 then reject the null hypothesis. However, there exists evidence to prove the presence of linear relationship between X and Y as the correlation coefficient is significantly different from zero.
- If the p-value is equal or greater than the significance level, then the hypothesis shouldn’t be rejected. Here we have sufficient evidence to prove that there is a significant linear relationship
Method 2 – using a table of critical values
- 95% critical values of the sample correlation coefficient table gives an idea whether the computed value of ‘r’ is significant. The critical value can be found using the degrees of freedom (df=n-2).
- If ‘r’ value doesn’t lie between the positive and negative critical values, then the correlation coefficient is considered to be significant.
- If ‘r’ is significant, then it can be used for line for prediction. However, if ‘r’ is not significant, then you cannot use the line prediction.
The probability of rejecting a false null hypothesis and probability of type-I error is determined by power and alpha. In Pearson’s correlation coefficient test, the value of power & alpha must lie between zero and one.
Degree of correlation
The major aspect in Pearson’s correlation coefficient test is the value of correlations. Typically, this value lies between -1and 1.
Let’s have a detailed look at various types of correlations depending on their value.
- Positive correlations – Here the correlation value is positive and the two variables are perfectly positively related. That is, the dots in a scatter line lie on the straight ascending order.
- Negative correlation – In this type of correlation the correlation value is -1 and the data points lie on straight descending order. I.e., the variables are linearly negatively related.
- Perfect correlation – The correlation is zero and the variables do not have any linear relationship. However, there is a probability of existence of non-linear relationships.
Besides positive, negative and perfect correlation, depending on the value, we have high. Moderate and low degree correlations.
High degree – Here, the correlation coefficient value lies between ± 0.50 and ± 1.
Moderate degree – The value lies between ± 0.30 and ± 0.49.
Low degree – Here, the value lies below + .29.
A glimpse at an example for the Pearson’s correlation coefficient test
10 students held their breath for a minute after breathing normally for 60 seconds and hyperventilating for 60 seconds. Let’s check if there is any association between the two variables ( normal and hyperventilating breathing) for a minimum of 4 students.
Student | A | B | C | D |
Normal breathing | 55 | 55 | 64 | 64 |
Hyperventilating breathing | 86 | 90 | 84 | 90 |
According to the graph, there is a linear relationship between the variables.
Correlation coefficient is one of the most powerful analysis techniques which is purely based on the collected data. Therefore, it is a must to collect accurate data which are capable of providing desired outcomes of the test.