Select Board & Class

Login

Correlation and Regression

Introduction to Correlation, Regression v/s Correlation

Objective After going through this lesson you shall be able to understand the following concepts.   

Meaning of Correlation Importance of Studying Correlation Objectives Difference between Correlation and Regression Bivariate Data

Introduction

Till now we have studied about measure of central tendency and measure of dispersion in detail. A noteworthy point is that these statistical measures dealt with only one variable at a time (univariate data). For example, we may find the mean height of the students of a class or the standard deviation among them. In both the cases, a single variable, height, was involved. However, many a times it is required to deal with more than one variable simultaneously. For example, we may wish to find the relationship between the age of a child and his/her height. In such cases, two other statistical tools, namely correlation and regression are studied.    In this lesson, we will study in detail about the correlation analysis. Regression analysis will be dealt with in the next lesson. 

Meaning of Correlation   Carefully observe your surroundings. You will notice that there are many such pairs of variables where one variable is related to the other. Take for example, the amount of rainfall and crop yield. The crop yield is directly related to the amount of rainfall. A similar relationship can be found in many variables such as price of a commodity and its supply; number of vehicles and pollution level and so on. The relationship between two variables is studied with the help of a statistical tool called correlation. It studies the degree and intensity of the relationship between the two variables. Significance of Correlation The study of correlation finds importance in understanding various practical life problems. i. Formation of laws: In economics, the study of correlation analysis forms the basis for various theories and laws such as the law of demand and that of supply, concept of elasticity, etc. For example, the law of demand is based on the relationship between the price of a commodity and its quantity demanded. ii. Degree and direction: Correlation helps in measuring degree and direction of relationship between two variables. For example, besides establishing the relationship between demand of a commodity and its price, it would also help in estimating the extent to which the two are related and in which direction. iii. Base for regression analysis: Correlation serves as the base for regression analysis. Once it is established that the two variables are correlated, the value of one variable given the value of other variable can be depicted using the regression analysis. iv. Business decisions and planning: Correlation analysis proves helpful in taking important business related decisions. For example, by looking at the trend on how increase in production has lead to increase in profitability, future plans regarding production can be easily made. v. Helps in policy formation by the government: Similar to business, correlation also helps the government in framing plans and policies. For example, policies regarding poverty alleviation can be framed on the basis of a correlation between expenditure on poverty alleviation programmes and percentage poverty reduction. Bivariate Data We know that statistical measures such as central tendency, dispersion, etc. relate to only one variable. Such distributions that relate to only one variable are known as univariate distributions. On the other hand, other statistical measures namely, correlation and regression deal with two variables simultaneously. Such data that relates to two variables is known as bivariate data and the corresponding distributions are known as bivariate frequency distributions or two-way frequency distributions. To understand bivariate distribution, consider the example given below. Example 1: The following are the marks obtained in statistics by students of classes A and B. (12, 25), (16, 29), (14, 32), (11, 37), (19, 26), (17, 39), (13, 33), (16, 24), (19, 37), (12, 43), (13, 40), (16, 35), (19, 38), (14, 32)   Construct a bivariate frequency distribution for the given data. Solution

Here, for the two variables, the marks of the students of class A and that of the students of class B, the class intervals can be taken as 10−12 and 20−25, respectively. Now, the data can be presented in the form of a bivariate distribution as shown below.  

Here, the marks of the students of class A are presented in the first column and the marks of the students of class B are presented in the first row. Now, to fill the values in the table, we find the combination of the marks of class A and class B as per the different class intervals. For instance, we first find a combination where the marks of class A lies in the interval 10−12 and the marks of class B lies in the class interval 20−25. Next, we find a combination for the class interval 10−12 for class A and class interval 25−30 for class B. In the similar manner, we find the combination for class interval 10−12 for class A with the remaining class intervals of class B. 

Next, we move to the second row to find the combination for class interval 12−14 for class A with the different class intervals of class B, that is, the combination of class interval 12−14 with the class intervals 20−25, 25−30, 30−35, 35−40 and 40−45 of class B. In the similar manner, we find the combination for different class intervals of class A and class B. 

Following this procedure, first consider the combinations 10−12 and 20−25 for class A and class B, respectively. In the given data, we can see that there is no such combination of values. In a similar manner, consider the combinations 10−12 and 35−40. Here, we can see that the combination (13, 33), i.e. 13 marks for class A and 33 marks for class B, corresponds to the required combination. So, the frequency for this combination is 1. In a similar manner, we can find the frequencies for the remaining combination of marks.  Note that the last column and the last row represent the marginal totals of the marks of the students of class A and class B, respectively, for the different class intervals. Marginal Distribution and Conditional Distribution From a bivariate distribution, two distributions can be derived. They are as follows: i. Marginal distribution ii. Conditional distribution Marginal Distribution: Marginal distribution is the frequency distribution of each of the variables individually along with the frequency totals/marginal totals. Example 2: Consider again the marks of the students in the two classes given in the abo…

To view the complete topic, please

What are you looking for?

Syllabus