The ordinary least squares (OLS) regression [1] method is presented with examples and problems with their solutions.
As a practical example, The North American Datum of 1983 (NAD 83), used the least square method to solve a system which involved 928,735 equations with 928,735 unknowns [2] which is in turn used in global positioning systems (GPS).
Least square linear regression is also used in business and finance to make predictions [5].
Let us assume that in the table below is shown observed or measured data values of the dependent variable \( y \) related to the \( n \) independent observed or measured variables \( x_1, x_2 , ..., x_n \)
\( x_1 \) | \(x_2\) | ... | \(x_n\) | \(y\) |
---|---|---|---|---|
\(a_{1,1} \) | \(a_{1,2} \) | ... | \(a_{1,n} \) | \(y_1 \) |
\(a_{2,1} \) | \(a_{2,2}\) | ... | \(a_{2,n} \) | \(y_2 \) |
... | ... | ... | ... | ... |
... | ... | ... | ... | ... |
\(a_{m,1} \) | \(a_{m,2} \) | ... | \(a_{m,n} \) | \(y_m \) |
Example 1
Let \( A = \begin{bmatrix}
1 & 1 \\
2 & 1\\
4 & 1
\end{bmatrix} \) , \( Y = \begin{bmatrix}
0\\
4\\
7
\end{bmatrix} \) and \( X = \begin{bmatrix}
\beta_1\\
\beta_0
\end{bmatrix} \)
a)
Express \( || \epsilon(\beta_{1}, \beta_{0}) ||^2 = ( A X - Y )^T ( A X - Y ) \) in terms of \( \beta_1 \) and \( \beta_0 \).
b)
Find the solution \( \hat X \begin{bmatrix}
\hat \beta_1\\
\hat \beta_0
\end{bmatrix} \) that minimizes \( || \epsilon(\beta_{1}, \beta_{0}) ||^2 \) using the normal equation \( \hat X = (A^T A)^{-1} A^T Y \).
c)
Graph \( || \epsilon(\beta_{1}, \beta_{0}) ||^2 ) \) and show that it has a minimum at \( \hat X = \begin{bmatrix}
\hat \beta_1\\
\hat \beta_0
\end{bmatrix} \) found in par b).
d) Use any software to calculate the vector \( \hat X \begin{bmatrix}
\hat \beta_1\\
\hat \beta_0
\end{bmatrix} \) and compare with the results obtained in part b).
Solution to Example 1
a)
\( ( A X - Y ) = \begin{bmatrix}
1 & 1 \\
2 & 1\\
4 & 1
\end{bmatrix} \begin{bmatrix}
\beta_1\\
\beta_0
\end{bmatrix} - \begin{bmatrix}
0\\
4\\
7
\end{bmatrix} \\\\
\quad =
\begin{bmatrix}
\beta_1 + \beta_0 \\
2 \beta_1 + \beta_0 - 4\\
4 \beta_1 + \beta_0 - 7
\end{bmatrix}
\)
\( || \epsilon( \beta_{1}, \beta_{0}) ||^2 = ( A X - Y )^T ( A X - Y ) \\ = (\beta_1 + \beta_0)^2 + (2 \beta_1 + \beta_0 - 4)^2 + (4 \beta_1 + \beta_0 - 7)^2 \)
b)
The transpose \( A^T \) of matrix \( A \) is given by
\( A^T = \begin{bmatrix}
1 & 2 & 4 \\
1 & 1 & 1
\end{bmatrix} \)
\( (A^T A)^{-1}
= \left(\begin{bmatrix}
1 & 2 & 4 \\
1 & 1 & 1
\end{bmatrix} \begin{bmatrix}
1 & 1 \\
2 & 1\\
4 & 1
\end{bmatrix} \right)^{-1} = \begin{bmatrix}
\dfrac{3}{14}&-\dfrac{1}{2}\\
-\dfrac{1}{2}&\dfrac{3}{2}
\end{bmatrix}
\)
The approximate solution is given by
\( \hat X = (A^T A)^{-1} A^T Y \)
Substitute by the known quantities to obtain
\( \hat X = \begin{bmatrix}
\dfrac{3}{14}&-\dfrac{1}{2}\\
-\dfrac{1}{2}&\dfrac{3}{2}
\end{bmatrix} \begin{bmatrix}
1 & 2 & 4 \\
1 & 1 & 1
\end{bmatrix} \begin{bmatrix}
0\\
4\\
7
\end{bmatrix} \)
Evaluate
\( \hat X = \begin{bmatrix}
\hat \beta_1\\
\hat \beta_0
\end{bmatrix} =
\begin{bmatrix}
\dfrac{31}{14}\\
-\dfrac{3}{2}
\end{bmatrix} \approx
\begin{bmatrix}
2.21\\
-1.5
\end{bmatrix}
\)
c)
\(|| \epsilon( \hat \beta_{1}, \hat \beta_{0}) ||^2 \\
= ( \hat \beta_1 + \hat \beta_0)^2 + (2 \hat \beta_1 + \hat \beta_0 - 4)^2 + (4 \hat \beta_1 + \hat \beta_0 - 7)^2 \\
= \left( \dfrac{31}{14} -\dfrac{3}{2} \right)^2 + \left(2 \times \dfrac{31}{14} -\dfrac{3}{2} - 4 \right)^2 + \left(4 \times \dfrac{31}{14} -\dfrac{3}{2} - 7\right)^2 \\
= \dfrac{25}{14} \approx 1.79
\)
The graph of \( || \epsilon( \beta_{1}, \beta_{0}) ||^2 \) is shown below. It has a minimum at point \( M = (2.21,-1.5,1.79) \).
The minimum value of \( |||| \epsilon( \beta_{1}, \beta_{0}) ||^2 \) is approximately equal to \( 1.79 \) at \( X = \begin{bmatrix}
\beta_1\\
\beta_0
\end{bmatrix} \approx \begin{bmatrix}
2.21\\
-1.5
\end{bmatrix} \) as predicted in the analytical calculations above.
d)
Using for multiple linear regression , we obtain the results shown in the table which are exactly the results obtained in the detailed calculations of parts b) above.
Part A
Given the data sets below:
a) Find a least square solution \( \hat X = (A^T A)^{-1} A^T Y \) using the steps as in example 1
b) Use any software to calculate \( \hat X = (A^T A)^{-1} A^T Y \) and compare the results.
1)
\( A = \begin{bmatrix}
-1 & 1 \\
1 & 1\\
2 & 1\\
3 & 1
\end{bmatrix} \) , \( Y = \begin{bmatrix}
4\\
8\\
10\\
12
\end{bmatrix} \)
2)
\( A = \begin{bmatrix}
-1 & -1 & 1\\
0 & 3 & 1\\
2 & 5 & 1\\
4 & 9 & 1
\end{bmatrix} \) , \( Y = \begin{bmatrix}
-4\\
6\\
7\\
10
\end{bmatrix} \)
Part B
A company spends an amount \( x_1 \) on advertising and an amount \( x_2 \) on improving the quality of its product. The profit \( y \) over a period of 5 months is given in the table below. \( x_1, x_2 \) and \( y \) are in millions of dollars.
\( x_1 \) | \(x_2\) | \(y\) |
---|---|---|
\( 2 \) | \( 1.5 \) | \( 3 \) |
\( 2.5 \) | \( 1.7 \) | \( 3.2 \) |
\( 3.0\) | \( 1.8 \) | \( 3.5 \) |
\( 3.2\) | \( 2.1 \) | \( 4.1\) |
\( 3.4 \) | \( 2.5 \) | \( 4.6\) |
Part A
1)
a)
\( A^T = \begin{bmatrix}
-1 & 1 & 2 & 3 \\
1 & 1 & 1 & 1
\end{bmatrix}
\)
\( A^T A = \begin{bmatrix}
-1 & 1 & 2 & 3 \\
1 & 1 & 1 & 1
\end{bmatrix}
\begin{bmatrix}
-1 & 1 \\
1 & 1\\
2 & 1\\
3 & 1
\end{bmatrix}
=
\begin{bmatrix}
15 & 5\\ 5 & 4
\end{bmatrix}
\)
\( (A^T A)^{-1} =
\begin{bmatrix}
\dfrac{4}{35}&-\dfrac{1}{7}\\
-\dfrac{1}{7}&\dfrac{3}{7}
\end{bmatrix}
\)
\( \hat X = (A^T A)^{-1} A^T Y = \begin{bmatrix}
\hat \beta_1\\
\hat \beta_0
\end{bmatrix}
= \begin{bmatrix}
2\\
6
\end{bmatrix}
\)
b)
The use of Excel for multiple linear regression , we obtain the results shown in the table which are exactly the results obtained in the detailed calculations of parts b) above.
2)
a)
\begin{bmatrix}
-1 & -1 & 1\\
0 & 3 & 1\\
2 & 5 & 1\\
4 & 9 & 1
\end{bmatrix}
\( A^T =
\begin{pmatrix}
-1&0&2&4\\
-1&3&5&9\\
1&1&1&1
\end{pmatrix}
\)
\( A^T A =
\begin{pmatrix}
-1&0&2&4\\
-1&3&5&9\\
1&1&1&1
\end{pmatrix}
\begin{bmatrix}
-1 & -1 & 1\\
0 & 3 & 1\\
2 & 5 & 1\\
4 & 9 & 1
\end{bmatrix} =
\begin{bmatrix}
21&47&5\\
47&116&16\\
5&16&4
\end{bmatrix}
\)
\( (A^T A)^{-1} =
\begin{bmatrix}
\dfrac{26}{19}&-\dfrac{27}{38}&\dfrac{43}{38}\\ -\dfrac{27}{38}&\dfrac{59}{152}&-\dfrac{101}{152}\\ \dfrac{43}{38}&-\dfrac{101}{152}&\dfrac{227}{152}
\end{bmatrix}
\)
\( \hat X = (A^T A)^{-1} A^T Y
= \begin{bmatrix}
\hat \beta_1\\
\hat \beta_2\\
\hat \beta_0
\end{bmatrix}
=
\begin{bmatrix}
-\dfrac{68}{19}\\ \dfrac{245}{76}\\ -\dfrac{279}{76}
\end{bmatrix}
\approx
\begin{bmatrix}
-3.57894 \\ 3.22368\\ - 3.67105
\end{bmatrix}
\)
b)
Using Excel for multiple linear regression , we obtain the results shown in the table which are exactly the results obtained in the detailed calculations of parts b) above.
Part B
Let \( A =
\begin{bmatrix}
2 & 1.5 & 1\\
2.5 & 1.7 & 1\\
3.0 & 1.8 & 1\\
3.2 & 2.1 & 1\\
3.4 & 2.5 & 1
\end{bmatrix}
\) and \( Y =
\begin{bmatrix}
3.0\\
3.2\\
3.5\\
4.1\\
4.6
\end{bmatrix}
\)
Hence
\( A^T = \begin{bmatrix}
2 & 2.5 & 3.0 & 3.2 & 3.4\\
1.5 & 1.7 & 1.8 & 2.1& 2.5\\
1 & 1 & 1 & 1 & 1
\end{bmatrix}
\)
\( A^T A =
\begin{bmatrix}
2 & 2.5 & 3.0 & 3.2 & 3.4\\
1.5 & 1.7 & 1.8 & 2.1& 2.5\\
1 & 1 & 1 & 1 & 1
\end{bmatrix}
\begin{bmatrix}
2 & 1.5 & 1\\
2.5 & 1.7 & 1\\
3.0 & 1.8 & 1\\
3.2 & 2.1 & 1\\
3.4 & 2.5 & 1
\end{bmatrix} \\
=
\begin{bmatrix}
41.05&27.87&14.1\\
27.87&19.04&9.6\\
14.1&9.6&5
\end{bmatrix}
\)
\( (A^T A)^{-1} =
\begin{bmatrix}
4.15584 &-5.45454 &-1.24675 \\ -5.45454 &8.80382 &-1.52153 \\ -1.24675 &-1.52153 &6.63718
\end{bmatrix}
\)
\( \hat X = \begin{bmatrix}
\hat \beta_1 \\ \hat \beta_2 \\ \hat \beta_0
\end{bmatrix} \\
\quad = (A^T A)^{-1} A^T Y \\
\quad =
\begin{bmatrix}
0.1273094\\
1.5139046\\
0.4145915
\end{bmatrix}
\)
Hence
\( \hat y = 0.1273094 x_1 + 1.5139046 x_2 + 0.4145915 \)
Prediction
\( \hat y = 0.1273094 \times 4 + 1.5139046 \times 4 + 0.4145915 = 6.9794475 \) (in millions of dollars)