Professional Documents
Culture Documents
I.
Introduction
This brief manual will provide readers with an introductory course into Ordinary Least Squares regression analysis and how to do very basic multivariate regression analyses on Microsoft Excel 2010. Ordinary Least Squares (from here on abbreviated as OLS) is the most widely used statistical technique for finding relationships between different variables. Regression analysis, what OLS is used in, allows for the modeling and examination between different relationships and helps explain factors behind observed patterns as well as aiding in future predictions. The possibilities for regression analysis are limitless and can be used for multiple fields. Some of these varied examples include: Modeling automobile accidents as a function of speed, road conditions, weather, time of day, etc. in order to create a policy with the intent on decreasing accidents. Forecasting the dollar amount banks should expect to give out in automobile loans as a function of an areas unemployment rate, labor force, current prime interest rates, cost of a gallon of gasoline, etc. to predict how much money customers would like to loan from the bank. Testing whether high school grade point average and SAT scores are indicative in predicting a students collegiate performance, and then estimating a particular prospective students ending college GPA based on the three independent variables discussed below.
For ease, this paper is going to use the third example mentioned as a hypothesis. We will look at collection of random high school math and verbal (now known as Critical Reading) SAT scores and ending high school grade point averages as independent variables (x) and ending college grade point averages as the dependent variable (y). The goal is to determine whether a correlation exists between a students ending college GPA and their math and verbal SAT scores as well as high school GPA. If there is a correlation, we can then make a mathematical equation to predict what other prospective students ending college grade point averages will be based on their SAT scores and high school GPA. Upon completion, readers will have an entry-level idea on what OLS regression analysis is and how to perform beginner level tests using a common computer spreadsheet program, Microsoft Excel 2010. Readers will also learn of the different terms used in regression analysis and how they are useful. The manual is broken down into a five part structure with helpful headings and subheadings to aid the process: I) Introduction, II) Description of Equipment, III) Equipment Needed, IV) Step-by-Step Directions, and V) Troubleshooting
II.
Description of Equipment
Microsoft Excel is the go-to software program for spreadsheet applications. One of the many numerous programs that Microsoft Excel has compiled is regression software which will be used in this manual. The regression function is available under the Data Analysis feature programmed into Excel. More detailed descriptions along with accompanying graphics are provided to the user in the Directions section for ease of use
III.
Materials Needed
Materials needed for running regression analysis through Microsoft Excel: A data set containing numerical values. o In this instruction manual a data set will be provided by the author for readers to follow. A laptop or desktop computer PC o This manual is not intended for Apple operating systems Microsoft Excel 2010 spreadsheet software installed and operable o If Excel is not already installed, a one-month free trial including Word and PowerPoint is available at http://office.microsoft.com/en-us/excel/ A writing utensil and scratch paper for notes.
IV.
A. Opening Microsoft Excel 2010
Directions
To open Microsoft Excel, place your cursor down to your start menu and select. From here there is a search field which you will type Excel into. This will bring up a list of suggestions and previous files and will provide you with the link to open Excel. The image to the right shows how the process appears. Open Excel.
Once you select the Analysis ToolPak Add-in, click on the button that reads Go as shown on the screen and can be seen on the graphic to the left. Proceed to select OK. In the authors case all that is required is to select the OK option as the add-in has already been installed.
entry-level test we will look at a collection of 30 observations that contain ending college and high school grade point averages, as well as SAT scores in math verbal skills. Seen to the right of the manual are the numerical values as well as the labels for each variable. Column A shows high school GPAs, B shows math SAT scores, C has the verbal SAT scores, and Column D contains the dependent variable, ending college grade point averages. Enter these exact figures shown to the right in the exact format that they are displayed to be used in this regression test. GPA-High represents the ending high school grade point average, while GPA-Univ represents the ending college grade point average. SAT-Math and SAT-Verb will be used to label the scores obtained on the mathematics SAT test and the Verbal [Critical Reading] portion.
The Output options category is to determine how the results should be displayed. Under the Output options category select New Worksheet Ply. This selection opens our regression information in a new sheet for Excel, alleviating clutter and confusion. Note: When the new sheet is opened, we will not be able to see our data set alongside. To access the data set, go to the bottom left of Excel and select Sheet1. The graphic on the next page shows a downward pointing arrow that allows us to switch between different spreadsheets. The Residuals category allows for graphs and charts to be displayed along with our information. Under Residuals choose the box that is labeled Line Fit Plots. This option shows a graph with all of our data points built in for one of the independent variables along the x-axis and the dependent variable across the y-axis. Select the box Standardized Residuals as well. For this experiment we will not require any other boxes selected under the Residuals category. The Normal Probability Plots section below does not need to be checked; this option is not necessary for this specific test. Press OK.
Now, for better viewing of information we need to readjust the row and column sizes to fit all the given information. A good size to use for this example is 17.3 for columns width A and B, and 13 for columns C to I. Row height can remain at the default setting. To adjust the column sizes, right click the letter of the column you wish to adjust, select Column Width, and enter the desired size. 2. Adjusting the Graphs The next step in helping understand our regression analysis is readjusting the graphs for easier viewing. To start this process, double click on the X-Axis of one of the three graphs created. For example, we will demonstrate how to change it for the SAT-Verb Line Fit Plot graph. A new window called Axis Options will appear. Within the Axis Options window we are given two options labeled Minimum and Maximum which contain a selectable choice called Fixed. Select the Fixed option and in the fields now available enter a number slightly smaller than the lowest verbal SAT score in our data for the Minimum field and in the Maximum field enter a number slightly larger than the highest verbal SAT score. A good starting point is 485 for the minimum and 735 for the maximum. Do the same for the Y-Axis, but in this category we will use the lowest and highest university GPA. The author suggests 2.0 and 4.0 for the minimum and maximum values The process for adjusting the x-axis is shown to the right. Follow these same steps to resize the other two graphs using the lowest and highest values respective to their categories. The author recommends the values in the x-axis for the SAT-Math Line Fit Plot graph to be a minimum and maximum of 550 and 720. The y-axis will be 2.0 and 4.0, the same as the previous graph. In the GPA-High Line Fit Plot graph the x-axis and y-axis will both have the same minimum and maximum values: 2.0 and 4.0. Press Close. 3. Adding a Trendline After shaping the graphs axis into being manageable and organized, the next step for visualizing our data is to add a Trendline and individual regression estimate. To do this, left-click on one the red data points shown on a graph. If done correctly, all the predicted value data points in that particular graph will be highlighted. Now right-click on the same spot and select Add Trendline This process can be seen in the small graphic to the right.
In the new window displayed select Linear. Under the Trendline Name category select Automatic. Entering a specific name for this example is unnecessary for our basic test. This new window and the options chosen can be seen on the larger graphic immediately to the right on this page. On the bottom of the window select Display Rsquared value on chart. This option will give us our correlation strength, the R-squared value, but will limit it to just one of the independent variables. Select Close. You should now see a straight line passing through your predicted value data points along with a number that identifies the strength of correlation. Repeat these steps for the other two graphs and then arrange the graphs on the spreadsheet for easier viewing. The authors example is shown below.
and the values determined from 30 previous samples the equation to forecast Macies college GPA would be as followed: Yi= b0+ b1x1+ b2 x2+ b3x3 Note: because we are forecasting, a stochastic error is not necessary. Estimated College GPA = .459423967 + .73143021(3.62) - .002513024(642) + .00339953(661) Estimated College GPA = .459423967 + 2.64778 1.61336 + 2.24709 Estimated College GPA = 3.74 Given the universities new policy of an expected college GPA of 3.10 or higher, Macies astounding expected 3.74 GPA would make her a shoo-in.
V.
Troubleshooting
If the instructions are followed correctly, there should not be any errors messages or differing results. Though if you do encounter one they can often be easily fixed. One error that could appear is Data contains non-numeric data. A problem that you could be encountering is that one or more of the data cells on Sheet1 not listed in the first row (A1-D1) contains either a blank space or a letter put into one of the numbered cells. Recheck the data set and make the correction using the specified value(s) the author has provided. Another error that could be occurring is that the windows and boxes that get pulled up are not the same as the authors. This error is most likely to occur in section IV. Directions; subheading F. Data Results and Modification; in process 2. Adjusting the Graphs and 3. Adding a Trendline. The reasoning behind this possible error is that in these particular processes there is a lot of precise clicking needed. For process 2. Adjusting the Graphs, the author recommends clicking once on the middle displayed axis number, be it the y-axis or x-axis; if done correctly a light gray box should surround the area of the axis in question. Once the gray box is seen, double-click without moving the mouse cursor away from the initial first-click that resulted in the gray selection box. If done correctly, a window will open up that is identical to the authors provided graphic of the step. For process 3. Adding a Trendline, if your screen is different than the graphic provided, it is likely because not all of the data points are selected, or you may have selected the entire graph rather than the data points. Hover your mouse over one of the red data points. The red data points represent the predicted values, not the actual values. Left-click once; all red data points should now be highlighted. Without moving your cursor go ahead and right-click opening up the small window as seen in the graphic for that particular step. The key solution for this error is to be patient and avoid double-clicking or right-clicking when left-clicking is required and vice versa. If your data and numerical values are off, this will be another problem, especially if you are trying to reenact the authors regression. To fix this return to Sheet 1, the spreadsheet that contains the data set. Examine the numbers in each cell carefully as a missing or wrongly placed decimal point will throw the calculations completely off. The same goes for missing or improperly placed numbers. If the error is still occurring after you are likely having some trouble with Section IV. Directions; subheading E. Designing the output. Review and compare the Input Y Range and Input X Range fields. If
they do not match, the example the author has provided then repeat the process shown for that particular step making sure the correct cells have been selected. Finally, if all the graphics match with the authors, but Macies GPA you calculated doesnt match with the authors calculation it is likely a mathematical error. To fix this review, the first function given that contains no identifying numerical values; examine how the process for multiplication, addition, and subtraction will take form. Use the scratch paper and writing utensil to jot down how the process should go. Next, review the regression coefficient values (the independent variables) and check to see if they match the computer regression calculations as well as the authors inputs. If those match, then reread Macies variables provided (her high school GPA, and her math and reading SAT scores). Place those values into the equation just as the author did. If your numbers still dont match up, remember to multiply conjoined numbers first before doing the addition (and in this case a subtraction) between the independent variables. If need be do each of the multiplications first, like the author instructed, rather than entering the whole formula into a calculator as some lower-level calculators do not have order pairing multiplication and division programmed in.