You are on page 1of 5

Regression Analysis Agricultural Yield in India

1|Page

Problem Definition & Inspiration: The inspiration for the problem undertaken was the fact that we wanted to analyze the various factors responsible for the growth in agricultural yield in India. One of the articles which roused our interest was an interview by Prof.Gopal Naik of IIMB, where he discusses the issues in Indian agriculture (link--http://tejas-iimb.org/interviews/12.php). We were specifically concerned with the effect of various infrastructure issues in agriculture. Also the fact that despite Indias fast paced economic growth, the contribution of agriculture is decreasing provided an interesting analytical problem. We proceeded to quantitatively and qualitatively analyze several factors which has/can have a direct/indirect impact on the agricultural yield on the agricultural yield. The relevant datasets for a total of 23 years was obtained from the website of Food & Agriculture Organization of the United Nations (http://faostat.fao.org/). The datasets used are real-world, authenticated or sufficiently approximated by FAO, UN.

Stage I analysis: Initially, there were 5 independent variables using which we tried predicting the total yield of cereals in India spanning over 23 years, from 1980 to 2002. The 5 independent variables used were: Agricultural Population Arable Land Fertilizer Net Investment in Agricultural Production Agricultural Machinery

The correlation for these 5 was very encouraging. But, the other parameters like Adjusted Rsquare, p values, DW statistic did not provide convincing results. We next tried several combinations of these input variables and finally came up with 3 inputs which depicted commendable results. These 3 inputs were agricultural population, arable land and fertilizers. Our sample for model creation consisted of 20 sample points which we tested with 3 sample points for the hold-out sample. Correlation analysis with all the 5 independent variables:

2|Page

Regression analysis with all the 5 independent variables:

The R square values (R square:0.9844) obtained was satisfactory.

From the above three tables we gauge the fact that the variables machinery import can be ignored in subsequent analysis (low correlation co-efficient compared to other variables). We used another model comprising 3 variablesagricultural population, arable land and fertilizer, which proved better in terms of estimating efficiency.

Stage II analysis We next tried several combinations of these input variables and finally came up with 3 inputs which depicted commendable results. These 3 inputs were: agricultural population arable land fertilizers

Our sample for model creation consisted of 20 sample points which we tested with 3 sample points for the hold-out sample. The correlation for these 3 was as below:

The p-values for the 3 inputs were found to be well under 10%.
3|Page

The ANOVA Table and Regression Statistics were as under:

Regression Statistics

The Adjusted R Square was found to be 97.58%, which means that the model as a whole is able to predict 97.58% of the output variation. The Mean Absolute Percentage Error for the model was 1.921167967. The same for the hold out sample was found out to be 3.711908. The Durbin-Watson Statistic was found to be 1.894797664 This result was pretty impressive and indicated a low correlation between the error terms.

Cost Benefit Analysis: We have conducted a cost benefit analysis on the basis of our models outcome. It encompasses following steps: 1. First of all, the composition of Indian food basket was calculated based on the data from Faostat. The basket mainly comprised of (75%) wheat and rice 2. Based on price for wheat and rice for 2000 and their share in overall Indian cereal basket, the cost per kg of Indian cereal basket was calculated 3. The cost so calculated was multiplied by total cereal production data for India resulting in total income from the cereal production activity in India 4. Based on assumptions such as per day labour cost of Rs. 80 and 300 working days for rural labour, the total labour cost was calculated 5. Overhead costs were assumed to be 10% of labour cost 6. Fertilizer costs were calculated by multiplying the total fertilizer consumption with Urea (most widely used fertilizer in the country) prices
4|Page

7. Subsequently, all the expenses were summed up and deducted from gross income as calculated in step 3 to find the total benefit Total benefit so calculated was divided by total agricultural population to find the benefit per capita

References: Data Source: http://faostat3.fao.org/home/index.html#DOWNLOAD

5|Page

You might also like