3 - Linear Regression Multiple Variables

1 Introduction
Linear Regression
with Multiple Variables
Embedded System

-1-
1 Introduction
Outline
Multiple features
Gradient descent for multiple variables
Feature scaling in gradient descent
Learning rate in gradient descent
Features and polynomial regression
Normal equation
-2-
1 Introduction
Outline
Multiple features
Normal equation
-3-
1 Introduction
Multiple Features (Variables)

= 0 + 1
Size in feet2 (x) Price ($) in 1000's (y)

2104 460
1416 232
1534 315
852 178

-4-
1 Introduction
Size (feet2) Number of Number of Age of home Price ($1000)

Bedrooms Floors (years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

-5-
1 Introduction


2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

Notation:

: number of features ( =4 in the above)
() =
() : input (features) of training example

()
: value of feature in training example () =

-6-
1 Introduction

Hypothesis (one variable)
= 0 + 1
Hypothesis (Multiple variables)

= 0 + 1 1 + 2 2 + 3 3 + 4 4
e.g. = 90 + 0.21 + 0.032 + 23 34
-7-
1 Introduction

Multivariate linear regression
= 0 0 + 1 1 + 2 2 + 3 3 + 4 4
()
For convenience of notation, define 0 = 1. (. . 0 = 1).
0 0
1 1
= 2 +1 , = 2 +1

= 0 0 + 1 1 + 2 2 + 3 3 + 4 4
=
=
-8-
1 Introduction
Outline
Multiple features
Normal equation
-9-
1 Introduction
Gradient Descent for Multiple Variables

Hypothesis
= = 0 0 + 1 1 + 2 2 + 3 3 + 4 4
Parameters
= 0 , 1 , , +1
Cost function
1
() () 2
= (0 , 1 , , ) =
2 =1
Gradient descent
Repeat {

()

- 10 -
} (simultaneously update for every = 0,1,2, , )
1 Introduction
Gradient Descent
One variable ( = 1) Mutiple ( 1)
Repeat { Repeat {

1 1 ()
0 0 () () () ()

=1 =1

1 }
1 1 () () () (Update for every

=1
= 0,1,2, , simultaneously ) ()
0 = 1
}

1 ()
(Update for 0 and 1 simultaneously) 0 0 () () 0

=1

1 ()
1 1 () () 1

=1

1 ()
2 2 () () 2

=1
- 11 -

1 Introduction
Outline
Multiple features
Normal equation
- 12 -
1 Introduction
Feature Scaling
Idea: Make sure features are on a similar scale
1 = size (0 2000 feet2) size (feet2)

1 =
2000
2 = number of bedrooms (1 5)
number of bedrooms
2 =
5
0 1 , 2 1
Making gradient descent
converge much faster
- 13 -
1 Introduction
Feature Scaling
Feature Scaling
Get every feature into approximately 1 1 , 2 1 range.
Given 0 = 1,
0 1 3 OK
2 2 0.5 OK
100 2 100 change
0.0001 2 0.0001 change
- 14 -
1 Introduction
Feature Scaling
Mean Normalization
Replace with
to make features have approximately zero mean
(Do not apply to 0 = 1).
For example,
100 # 2
1 = 2000
, 2 = 5
0.5 1 , 2 0.5
1 1 2 2
1 2
1 2
1 : average value of 1 2 : average value of 2
1 2
Either Range of 1 (max-min) Either Range of 2 (max-min)
or Standard deviation of 1 or Standard deviation of 2
- 15 -
1 Introduction
Outline
Multiple features
Normal equation
- 16 -
1 Introduction
Gradient Descent

()

Debugging
How to make sure gradient descent is working correctly
How to choose learning rate
- 17 -
1 Introduction
Gradient Descent
Making sure gradient descent is working correctly
() should decrease
after every iteration
0 100 200 300 400

No. of iterations
Declare convergence
if () decreases by less than = 10-3 in one iteration.
- 18 -
1 Introduction
Gradient Descent
Making sure gradient descent is working correctly
Gradient descent not working
Use smaller
No. of iterations
Use smaller
No. of iterations No. of iterations
For sufficiently small , () should decrease on every iteration.

But if is too small, gradient descent can be slow to converge.
- 19 -
1 Introduction
Gradient Descent
Summary
Too small
Slow convergence
Too large
Most of times, () may not decrease on every iteration
() may not converge
(Sometimes, slow convergence is also possible.)
To choose , try
, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1,
X3 X 3 X3 X 3
- 20 -
1 Introduction
Outline
Multiple features
Normal equation
- 21 -
1 Introduction
Housing Prices Prediction

Two features
= 0 + 1 + 2
1 2
(New) One feature

frontage
Area = frontage*depth depth
= 0 + 1
- 22 -
1 Introduction
Polynomial Regression
Price
(y)
Size (x)
- 23 -
1 Introduction
0 + 1 + 2 2
Price
(y)
Size (x)
- 24 -
1 Introduction
0 + 1 + 2 2
Price
(y)
Size (x)
- 25 -
1 Introduction
0 + 1 + 2 2 + 3 3
0 + 1 + 2 2
Price
(y)
Size (x)
2 3
= 0 + 1 1 + 2 2 + 3 3 = 0 + 1 + 2 + 3
1 = , 2 = ()2 , 3 = ()3
- 26 -
1 Introduction
0 + 1 + 2 2 + 3 3
0 + 1 + 2 2
Price
(y)
Size (x)
2 3
= 0 + 1 1 + 2 2 + 3 3 = 0 + 1 + 2 + 3
1 = , 2 = ()2 , 3 = ()3
Feature scaling is necessary

size: 1-1,000 (ft2) size2: 1~106, size3: 1~109
- 27 -
1 Introduction
Choice of Features
Price
(y)
Size (x)
- 28 -
1 Introduction
Choice of Features
Price
(y)
Size (x)
= 0 + 1 () + 2 ()2
= 0 + 1 () + 2
- 29 -
1 Introduction
Extending Linear Regression

Extending Linear Regression to More Complex Models
The inputs for linear regression can be:
Original quantitative inputs
Transformation of quantitative inputs
log, exp, square root, square, etc.
Polynomial transformation
= 0 + 1 1 + 2 2 + 3 3
Basis expansions
Dummy coding of categorical inputs
Interactions btw variables
example: 3 = 1 2
This allows use of linear regression techniques

to fit non-linear datasets
- 30 -
1 Introduction
Linear Basis Function Models

Generally,

Basis function
= ()
=0
Typically, 0 = 1 so that 0 acts as a bias
In the simplest case, we use linear basis functions:
- 31 -
1 Introduction
Linear Basis Function Models

Polynomial basis functions
=
Gaussian basis functions

2

=
2 2
Sigmoidal basis functions

=

1
where () =
1+exp()
- 32 -
1 Introduction
Outline
Multiple features
Normal equation
- 33 -
1 Introduction
Normal Equation
A least-square solution to =
2
=
iff
is a solution to the normal equation =

i.e. = 1
- 34 -
1 Introduction
Gradient Descent
- 35 -
1 Introduction
Normal Equation
Normal equation
Method to solve for analytically
If
= 2 + +

Set = =0

Solve for
If +1
1 2
0 , 1 , , =
2 =1

Set = = 0 (for every )

Solve for 0 , 1 , ,
- 36 -
1 Introduction
Normal Equation
If +1
1 2
0 , 1 , , =
=1
2
1
= 2
2
()
0
0 (1) ()
1
(2)
where = 1 , = (design matrix), () = ()
2

() ()

= 1

- 37 -
1 Introduction
Example

=4 ()
Number of Number of Age of home Price ($1000)
Size (feet2) Bedrooms Floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 2104 5 1 45 460
= 1 1416 3 2 40 4(+1) = 232
1 1534 3 2 30 315
1 852 2 1 36 178
= 1
- 38 -
1 Introduction
Example
=5

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 3000 4 1 38 540
1 2104 5 1 45 460
1 1416 3 2 40 232
= 1 5(+1) = 315
1534 3 2 30
1 852 2 1 36 178
1 3000 4 1 38 540
- 39 -
= 1
1 Introduction
Examples and Features

examples: (1) , (1) , (2) , (2) , , () , ()
features
()
0
()
(1)
1
() = () +1 , = (2) (+1)
2

() ()

(1)
(2)
= = 1

()
- 40 -
1 Introduction
Examples and Features

examples: (1) , (1) , (2) , (2) , , () , ()
One feature
(1)
1 1
1 (2)
If ()
= () 2 , = 1 1 2
1
()
1 1
- 41 -
1 Introduction
Gradient Descent vs Normal Equation

examples and features
Gradient Descent Normal Equation
Need to choose No need to choose
Needs many iterations No need to iterate
Need to compute 1
( 3 )
Works well even when is large Slow if is very large
- 42 -
1 Introduction
Normal Equation
= 1
What if is non-invertible? (i.e. 1

does not exist)
Singular or degenerate
Pseudo inverse
Singular
Redundant features (linearly dependent)
e.g. 1 = size in feet2
2 = size in m2
1 = 3.28 2 2
Too many features (e.g. )
Delete some features, or use regularization
- 43 -
1 Introduction
References
Andrew Ng, https://www.coursera.org/learn/machine-learning
Eric Eaton, https://www.seas.upenn.edu/~cis519
http://www.holehouse.org/mlclass/04_Logistic_Regression.html
- 44 -

3 - Linear Regression Multiple Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 - Linear Regression Multiple Variables

Uploaded by

Copyright:

Available Formats

1 Introduction

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Multiple Features (Variables)

Size in feet2 (x) Price ($) in 1000's (y)

Multiple Features (Variables)

Size (feet2) Number of Number of Age of home Price ($1000)

Multiple Features (Variables)

Size (feet2) Number of Number of Age of home Price ($1000)

Multiple Features (Variables)

Hypothesis (Multiple variables)

Multiple Features (Variables)

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Gradient Descent for Multiple Variables

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

1 = size (0 2000 feet2) size (feet2)

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

How to choose learning rate

0 100 200 300 400

Gradient descent not working

No. of iterations No. of iterations

For sufficiently small , () should decrease on every iteration.

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Housing Prices Prediction

(New) One feature

Feature scaling is necessary

Extending Linear Regression

This allows use of linear regression techniques

Linear Basis Function Models

Typically, 0 = 1 so that 0 acts as a bias

In the simplest case, we use linear basis functions:

Linear Basis Function Models

Gaussian basis functions

Sigmoidal basis functions

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

is a solution to the normal equation =

Examples and Features

Examples and Features

Gradient Descent vs Normal Equation

Gradient Descent Normal Equation

Need to choose No need to choose

Needs many iterations No need to iterate