CORRELATION COEFFICIENT

The quantity “r”, called the linear correlation coefficient, measures the strength and the direction of a linear relationship between two variables (x and y)

The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of its developer Karl Pearson

The mathematical formula for computing **r** is:

where n = number of pairs of data

The value of r is such that -1 __<__ *r* __<__ +1. The + and – signs are used for positive linear correlations and negative linear correlations, respectively.

**Positive Correlation**: If *x* and *y* have a strong positive linear correlation, *r* is close to +1. An *r* value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between *x* and *y* variables such that as values for *x* increase, values for y also increase.

**Negative Correlation**: If *x* and *y* have a strong negative linear correlation, *r* is close to -1. An *r* value of exactly -1 indicates a perfect negative fit. Negative values indicate a relationship between *x* and *y *such that as values for *x* increase, values for *y* decrease

**No Correlation**: If there is no linear correlation or a weak linear correlation, *r* is close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables

Note that *r* is a dimensionless quantity; that is, it does not depend on the units employed. A perfect correlation of ± 1 occurs only when the data points all lie exactly on a straight line. If *r* = +1, the slope of this line is positive. If *r* = -1, the slope of this line is negative

A correlation greater than 0.8 is generally described as *strong*, whereas a correlation less than 0.5 is generally described as *weak*. These values can vary based upon the “type” of data being examined. A study utilizing scientific data may require a stronger correlation than a study using social science data

Suppose the following set of data is available regarding the total marks scored in mid term exam and final exam by 30 students in the class, let us see how to calculate the correlation coefficient in MS Excel.

COEFFICIENT OF DETERMINATION

The coefficient of determination, r^{2}, is useful because it gives the proportion of the variance (fluctuation) of one variable that is predictable from the other variable

It is a measure that allows us to determine how certain one can be in making predictions from a certain model / graph

The coefficient of determination is the ratio of the explained variation to the total variation

The coefficient of determination is such that 0 __<__ r^{2} __<__ 1, and denotes the strength of the linear association between x and y

The coefficient of determination represents the percent of the data that is closest to the line of best fit.

In the above example, r = 0.123, then r^{2} = 0.015, which means that 1.5% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation) and the other 98.5% of the total variation in y remains unexplained

The coefficient of determination is a measure of how well the regression line represents the data, i.e., if the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation

The further the line is away from the points, the less it is able to explain