You are on page 1of 14

ADAMSON UNIVERSITY College of Engineering MGC Department

Least Square Method of Price Forecasting

Submitted by: Charles Alfred B. Semilla BSEM-5 Student No: 200514224

Planning for the future is the essence of any business. Businesses need estimates of future values of business variables. Commodities industry needs forecasts of supply, sales and demand for production planning, sales, marketing and financial decisions. Some businesses need forecasts of monetary

variables - costs or price, for example. Financial institutions face the need to forecast volatility in stock prices. Forecasting plays a critical role in business planning. The ability to accurately predict the future is fundamental to many decision activities in retail, marketing, production, inventory control, personnel, and many other business functional areas. Increasing forecasting accuracy can facilitate the savings of millions of dollars to a company and is a major motivation for using formal systematic forecasting methods and for investing new, better forecasting methods. Time series modeling approach is one of the major forecasting approachcausal method where a number of explanatory or causal variables have to be identified and predicted.time series approach has the benefit of easier data collection and modeling preparation. In time series forecasting, historical data of the prediction variable are collected and analyzed to develop a model that captures the underlying relationship among time series observation. The model is then used to extrapolate the time series into the future for forecasting. Numerous efforts have been devoted to developing and improving time series forecasting methods. Broadly speaking, there are two approaches to time series modeling and forecasting: linear approach and nonlinear approach. The linear approach assumes a linear underlying data generating process. There are large number of linear forecasting models such as moving average, exponential smoothing, time series regression, and time series decomposition. Field data is often accompanied by noise. Even though all control parameters (independent variables) remain constant, the resultant outcomes (dependent variables) vary. A process of quantitatively estimating the trend of the outcomes, also known as regression or curve fitting, therefore becomes necessary. The curve fitting process fits equations of approximating curves to the raw field data. Nevertheless, for a given set of data, the fitting curves of a given type are generally not unique. Thus, a curve with a minimal deviation from all data points is desired. This best-fitting curve can be obtained by the method of least squares. The Method of Least Squares The method of least squares assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square error) from a given set of data.

Suppose that the data points are variable and

, ...,

where

is the independent from

is the dependent variable. The fitting curve

has the deviation (error)

each data point, i.e., , , ..., method of least squares, the best fitting curve has the property that:

. According to the

The goal of Least-Squares Method is to find a good estimation of parameters that fit a function, f(x), of a set of data, x1...xn. The Least-Squares Method requires that the estimated function has to deviate as little as possible from f(x) in the sense of a 2-norm. Generally speaking, Least-Squares Method has two categories, linear and non-linear. We can also classify these methods further: ordinary least

squares (OLS), weighted least squares (WLS), and alternating least squares (ALS) and partial least squares (PLS).

To fit a set of data best, the least-squares method minimizes the sum of squared residuals (it is also called the Sum of Squared Errors, SSE.)

, with, ri, the residual, which is the difference between the actual points and the regression line, and is defined as ri = yi f(xi) where the m data pairs are , and the model function is f(xi).

At here, we can choose n different parameters for f(x), so that the approximated function can best fit the data set. For example, in the graph below, R3 = Y3 - f(X3), and (R12 + R22 + R32 + R42 + R52), the sum of the square of each red line's length, is what we want to minimize.

Linear Least-Squares Method Linear Least-Squares (LLS) Method assumes that the data set falls on a straight line. Therefore, f(x) = ax + b, where a and b are constants. However, due to experimental error, some data might not be

on the line exactly. There must be error (residual) between the estimated function and real data. Linear Least-Squares Method (or l2 approximation) defined the best-fit function as the function that

minimizes The advantages of LLS: 1. If we assume that the errors have a normal probability distribution, then minimizing S gives us the best approximation of a and b. 2. We can easily use calculus determined the approximated value of a and b.

To minimize S, the following conditions must be satisfied

, and

Taking the partial derivatives, we obtain

and

This system actually consists of two simultaneous linear equations with two unknowns a and b. (These two equations are so-called normal equations.) Based on the simple calculation on summation, we can easily find out that

and where

. Thus, the best estimated function for data set (i,yi), for i is an integer between [1, n], is

y = ax + b, where

and

Linear Least-Squares Method in matrix form We can also represent estimated linear function in the following model: yi = a1 * x1 + a2 * x2 + ... + am * zm + r. It can be also represented in the matrix form: {Y}=[X]{A}+{R}, where [X] is a matrix containing coefficients that are derived from the data set (It might not be a square matrix based on the number of variables (m), and data point (n).); Vector {Y} contains the value of dependent variable, which is solve, which is is . ; Vector {A} contains the unknown coefficients that we'd like to ; Vector {R} contains the residuals, which

To minimize RT, we follow the same method in LLS, obtaining partial derivative for each coefficient, and setting it equal zero. As a result, we have a system of normalized equations, and they can be represented in the following matrix form: [[X]T[X]]A = [X]T[Y]. To solve the system, we have many options, such as LU method, Cholesky method, inverse matrix, and Gauss-Seidel. (Generally, the equations might not result in diagonal dominated matrices, so Gauss-Seidel method is not recommended.)

Least-Squares Method in statistical view From equation [[X]T[X]]A = [X]T[Y], we can derive the following equation: A = [[X]T[X]]
1

[X]TY.

From this equation, we can determine not only the coefficients, but also the approximated values in statistic. Using calculus, the following formulas for coefficients can be obtained:

a= where

and

Sxy =

Moreover, the diagonal values and non-diagonal values matrix [[X]T[X]] covariances of coefficient ai, respectively. Assume the diagonal values of [[X]T[X]] and
1

represents variances and

is xi,i and the corresponding coefficient is ai, then

where sy / x is called stand error of the estimate, and

(Here, lower index, y/x, means that the error of certain x is caused by the inaccurate approximation of corresponding y.) We have many application on these two information. For example, we can derive the upper and lower bound of intercept and slope.

Application We use the method of Least squares when we have a series of measures (xi, yi) with i = 1, 2, ..., n (i.e., we measured a set of values we called y, and each of these depended on the value of a variable we called x), and we know that the measured points (xi, yi), when drawn on a plane, should (in theory) form a straight line. Because of measurement errors, the points measured, most probably, will not form a straight line, but they will be aproximately aligned. We are interested in finding the straight line which is the most similar to the points measured. So we want to find a function f(x) = ax + b (that is, a straight line) such that function. How do we express most similar mathematically? The criterion we will follow to find those a and b is to minimize the error f(xi) yi. If we define it this way, this error is the vertical distance between each measured point and the straight line we are seeking. To minimize the total error, we will try to minimize the sum of all errors: We only need to find the appropiate a and b and we will have found our

. But this formula for the global error has a problem: If we have two points, the value yi of the first being far below the line (so with a big positive error), and the other one, yj, being far above the line (so with a big negative error), when we sum both errors they will cancel out mutually, giving a total error of E = 0. Evidently, that is not what we are seeking. The solution to this problem is to change our formula for the total error and try this one:

. We will not sum the individual errors, but their squares. This way, all of them will be positive, and to minimize the sum all of them will have to tend to 0. Why don't we use the absolute value of the errors ( | f(xi) yi | )? Because, as we will see, if we use the square of the errors there exists an exact formula to find the a and b which minimize the error (and with the absolute values there is not such formula). This is why the method is called "least squares", because it tries to find the line which produces the least squares of the individual errors. Note that we are assuming that the x values are exact, and all the errors are in the y values. In most cases this may be not true, and we could have errors in the y and in the x values. There are other methods which consider this, but the least squares method is the most simple and the most used. Which are a and b, then? The values for a and b that minimize the total error E are:

and

The most convenient way to calculate these values is to tabulate your data in columns, namely , and . After calculating the products for each pair , we sum each column,

obtaining this way

and

. With these values calculated we only

have to substitute them in the formulas for a and b and we are done.

Of course, all the sums and products can be done automatically using a spreadsheet such as Excel. The steps are explained here with an example.

How to find a and b To find a and b we have to minimize the two-variables function E discussed above. To minimize Nvariables functions, as you shall see when you see partial derivatives, you have to find the point(s) where all of its partial derivatives become 0. Considering the form of E, this gives a set of two linear equations of the two variables a and b. The system is easily solved using Crammer's rule (or equivalently, since the matrix of coefficients is 2x2, inverting it, which is trivial). And the solution found is the expresions given above for a and b.

Any straight line drawn on a graph can be represented by the equation: y = a + bx Each point on the line has an x value and a y value, and the equation tells us how these x and y values are related. Different values of a and b give different straight lines. The value of a tells us the INTERCEPT of the line on the y-axis (a is the distance from zero of the point where the line crosses the y-axis, i.e., it is the value of y, when x is zero) and b indicates the SLOPE (the gradient) of the line. To decide on the line to fit through our scatter, we have to decide what values of a and b to use. Basically we choose them in such a way that the vertical distances of the points from the line are

minimized. (To be more precise, we choose a and b that minimizes the sum of the squares of these vertical distances. Hence the name least squares method.)

Sample Problem: Farmer Mike owns a blackberry farm in the United States. He wants to determine the relationship between the period he was selling his blackberries in the last five years and its selling price, as well as the price at which he would sell them in the following ten years.

Historical Data Year (x) Price of Blackberries in US$/lb (y) xy 2007 2008 2009 2010 2011 =10045 3.22 3.43 3.69 3.85 3.94 6462.54 6887.44 7413.21 7738.5 x^2 4028049 4032064 4036081 4040100

7923.34 4044121 =18.13 =36425.03 =20180415

(X)^2 =

(10045)^2 = 100902025 a = y bx = 3.626 (0.183)(2009)

x ave =

2009

= -364.021

y ave =

3.626 In our example, we find, using an appropriate calculator:

b=

a = -364.021 b = 0.183

36425-(10045)(18.13)/5 20180415-100902025/5 =0.183 So the equation of our fitted line is y = -364.021 + 0.183 times x

Period in Years (X) 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Price of Blackberries in US$/lb (Y) 4.18 4.36 4.54 4.72 4.91 5.09 5.27 5.46 5.64 5.82

Alternative Method of Forecasting by ARIMA

One of the most important and popular linear models is the autoregressive integrated moving average (ARIMA) that was popularized by Box and Jenkins (1976) in the 1970s. Often times, ARIMA is also called the Box-Jenkins model. Although ARIMA models are quite flexible in modeling a wide range of time series patterns, their major limitation is the presumed linear form of the model. That is, a linear autocorrelation structure is assumed before the model is fitted to the data.

Time series forecasting methods more advanced than those considered in our simple package do exist. These are based on AutoRegressive Integrated MovingAverage (ARIMA) models. Essentially these assume that the time series has been generated by a probability process with future values related to past values, as well as to past forecast errors. ARIMA econometric modeling takes into account historical data and decomposes it into an Autoregressive (AR) process, where there is a memory of past events (e.g., the interest rate this month is related to the interest rate last month, and so forth, with a decreasing memory lag); an Integrated (I) process, which accounts for stabilizing or making the data stationary and ergodic, making it easier to forecast; and a Moving Average (MA) of the forecast errors, such that the longer the historical data, the more accurate the forecasts will be, as it learns over time. ARIMA models therefore have three model parameters, one for the AR(p) process, one for the I(d) process, and one for the MA(q) process, all combined and interacting among each other and recomposed into the ARIMA (p,d,q) model. In the model identification stage, the forecaster must decide whether the time series is autoregressive, moving average, or both. This is usually done by visually inspecting diagrams of the data or employing various statistical techniques. In the second stage, model estimation and diagnostic checks, the forecaster verifies the original model identification is correct. This requires subjecting the model to a variety of diagnostics. If the model checks out, the forecaster then proceeds to the third stage, forecasting. The principle advantage of using the ARIMA approach is that the method can generate confidence intervals around the forecasts. This actually serves as another check of the validity of the model. If it predicts a high degree of confidence about a dubious forecast, the modeler may have to respecify the form of the model. In order to achieve best results using the Box-Jenkins ARIMA approach, three assumptions need be met. The first is the generally accepted threshold of 50 data points. This tends to be a significant obstacle for many local governments who may collect data only annually for some types of revenue. The second assumption is that the data series is stationary, i.e. that the data series varies around a constant mean and variance. Running a regression on two non-stationary variables can result in spurious results. If the data is non-stationary, the data series needs differencing and/or the addition of a time trend. If the data is trend non-stationary only, then adding a linear time trend to the model will render the series stationary. Trend non-stationary data have a mean and variance that change over time by a constant amount. If the data is first-difference non-stationary, then first differencing of the data will render the series stationary. Differencing involves subtracting the observation in time t by the observation in time t-1 for all observations. Whether the data requires these types of treatment should become apparent at the identification stage, and is generally easily accomplished with econometric software programs. The third assumption of ARIMA models is that the series be homoscedastic, i.e. has a constant variance. If the amplitude of the variance around the mean is great even after differencing, the series is considered heteroscedastic. The remedy for this problem may be simple or complex and involves measures such as using the natural logarithm of the data, using square or cubed roots, or truncating the data series (cutting out certain values).

The autoregressive component predicts future values based on a linear combination of prior values. An autoregressive process of order p can be shown as:

where Ft is the predicted value at time t, At-p is the actual value at time t, and parameters.

ps

are the estimated

The moving average component provides forecasts based on prior forecasting errors. The moving average component of a model for a q-order process can be shown as:

where Ft is the predicted value at time t, estimated parameters.

t-q

is the forecast error at time t, and the

s are the

These two components together form autoregressive moving average (ARMA) models. ARMA models assume a stationary data series before first differencing or the inclusion of a time trend. If a series has been rendered trend or difference stationary, the above models form the Box-Jenkins ARIMA model (Box and Jenkins, 1976). The number of autoregressive and moving average lags in an ARIMA model is represented as ARIMA(p,d,q), where d is the degree of difference, i.e. d=1 if the data is first differenced, d=2 if the data is second differenced, etc. If d=0, the ARIMA model is an ARMA(p,q). Further derivations can also take into account seasonality by considering autoregressive or moving average trends that occur at certain points in time. In the case of seasonality the ARIMA model is expressed as ARIMA(p,d,q)(P,Q) where P is the number of seasonal autoregressive lags and Q is the number of seasonal moving average lags. Seasonality is a consideration with relatively frequent data, such as weekly, monthly, or possibly quarterly data.

Example:

Considering the same problem for Least Square, we apply ARIMA(2, 1, 0) model. Model parameter were shown as following
parameter estimate

Type AR1 AR2

Coef. 0.5933 0.5321

SE Coef.

We obtained the model in the form Yt = Yt1 + 0.5933(Yt1 Yt2) + 0.5321(Yt2 Yt3) with the MAPE.

Year

Price of Blackberries in US$/lb (y)

2007 2008 2009 2010 2011

3.22 3.43 3.69 3.85 3.94

In 2011, the price of blackberries is US$3.94/lb (Yt-1).

The forecast of price of blackberries in 2012 would be Yt = 3.94 + 0.5933(3.94 3.85) + 0.5321(3.85 3.69) = 4.09

Forecast of price Date 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 References: Forecast 4.09 4.23 4.39 4.56 4.75 4.95 5.17 5.41 5.67 5.95

Zhang, G. Peter. Neural Networks In Business Forecasting. Idea Group Inc., 2004.

Chapra, Steven and Canale, Raymond. Numerical Method for Engineers: With Software and Programming Applications Fourth Edition. McGraw-Hill, 2005.

You might also like