You are on page 1of 44

1 Introduction

Linear Regression
with Multiple Variables

Embedded System

-1-
1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

-2-
1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

-3-
1 Introduction

Multiple Features (Variables)


= 0 + 1

Size in feet2 (x) Price ($) in 1000's (y)


2104 460
1416 232
1534 315
852 178

-4-
1 Introduction

Multiple Features (Variables)

Size (feet2) Number of Number of Age of home Price ($1000)


Bedrooms Floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

-5-
1 Introduction

Multiple Features (Variables)

Size (feet2) Number of Number of Age of home Price ($1000)


Bedrooms Floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

Notation:

: number of features ( =4 in the above)
() =
() : input (features) of training example

()
: value of feature in training example () =

-6-
1 Introduction

Multiple Features (Variables)


Hypothesis (one variable)
= 0 + 1

Hypothesis (Multiple variables)


= 0 + 1 1 + 2 2 + 3 3 + 4 4
e.g. = 90 + 0.21 + 0.032 + 23 34

-7-
1 Introduction

Multiple Features (Variables)


Multivariate linear regression
= 0 0 + 1 1 + 2 2 + 3 3 + 4 4
()
For convenience of notation, define 0 = 1. (. . 0 = 1).

0 0
1 1
= 2 +1 , = 2 +1

= 0 0 + 1 1 + 2 2 + 3 3 + 4 4
=
=

-8-
1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

-9-
1 Introduction

Gradient Descent for Multiple Variables


Hypothesis
= = 0 0 + 1 1 + 2 2 + 3 3 + 4 4

Parameters
= 0 , 1 , , +1

Cost function
1
() () 2
= (0 , 1 , , ) =
2 =1

Gradient descent
Repeat {

()

- 10 -
} (simultaneously update for every = 0,1,2, , )
1 Introduction

Gradient Descent
One variable ( = 1) Mutiple ( 1)

Repeat { Repeat {

1 1 ()
0 0 () () () ()

=1 =1

1 }
1 1 () () () (Update for every

=1
= 0,1,2, , simultaneously ) ()
0 = 1
}

1 ()
(Update for 0 and 1 simultaneously) 0 0 () () 0

=1

1 ()
1 1 () () 1

=1

1 ()
2 2 () () 2

=1

- 11 -

1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

- 12 -
1 Introduction

Feature Scaling
Idea: Make sure features are on a similar scale

1 = size (0 2000 feet2) size (feet2)


1 =
2000
2 = number of bedrooms (1 5)
number of bedrooms
2 =
5
0 1 , 2 1
Making gradient descent
converge much faster

- 13 -
1 Introduction

Feature Scaling
Feature Scaling
Get every feature into approximately 1 1 , 2 1 range.

Given 0 = 1,
0 1 3 OK
2 2 0.5 OK
100 2 100 change
0.0001 2 0.0001 change

- 14 -
1 Introduction

Feature Scaling
Mean Normalization
Replace with
to make features have approximately zero mean
(Do not apply to 0 = 1).

For example,
100 # 2
1 = 2000
, 2 = 5
0.5 1 , 2 0.5

1 1 2 2
1 2
1 2
1 : average value of 1 2 : average value of 2
1 2
Either Range of 1 (max-min) Either Range of 2 (max-min)
or Standard deviation of 1 or Standard deviation of 2
- 15 -
1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

- 16 -
1 Introduction

Gradient Descent

()

Debugging
How to make sure gradient descent is working correctly

How to choose learning rate

- 17 -
1 Introduction

Gradient Descent
Making sure gradient descent is working correctly

() should decrease
after every iteration

0 100 200 300 400


No. of iterations

Declare convergence
if () decreases by less than = 10-3 in one iteration.
- 18 -
1 Introduction

Gradient Descent
Making sure gradient descent is working correctly

Gradient descent not working

Use smaller
No. of iterations

Use smaller

No. of iterations No. of iterations

For sufficiently small , () should decrease on every iteration.


But if is too small, gradient descent can be slow to converge.

- 19 -
1 Introduction

Gradient Descent
Summary
Too small
Slow convergence
Too large
Most of times, () may not decrease on every iteration
() may not converge
(Sometimes, slow convergence is also possible.)

To choose , try
, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1,

X3 X 3 X3 X 3

- 20 -
1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

- 21 -
1 Introduction

Housing Prices Prediction


Two features
= 0 + 1 + 2
1 2

(New) One feature


frontage
Area = frontage*depth depth
= 0 + 1

- 22 -
1 Introduction

Polynomial Regression

Price
(y)

Size (x)

- 23 -
1 Introduction

Polynomial Regression

0 + 1 + 2 2
Price
(y)

Size (x)

- 24 -
1 Introduction

Polynomial Regression

0 + 1 + 2 2
Price
(y)

Size (x)

- 25 -
1 Introduction

Polynomial Regression
0 + 1 + 2 2 + 3 3

0 + 1 + 2 2
Price
(y)

Size (x)

2 3
= 0 + 1 1 + 2 2 + 3 3 = 0 + 1 + 2 + 3
1 = , 2 = ()2 , 3 = ()3

- 26 -
1 Introduction

Polynomial Regression
0 + 1 + 2 2 + 3 3

0 + 1 + 2 2
Price
(y)

Size (x)

2 3
= 0 + 1 1 + 2 2 + 3 3 = 0 + 1 + 2 + 3
1 = , 2 = ()2 , 3 = ()3

Feature scaling is necessary


size: 1-1,000 (ft2) size2: 1~106, size3: 1~109

- 27 -
1 Introduction

Choice of Features

Price
(y)

Size (x)

- 28 -
1 Introduction

Choice of Features

Price
(y)

Size (x)

= 0 + 1 () + 2 ()2

= 0 + 1 () + 2

- 29 -
1 Introduction

Extending Linear Regression


Extending Linear Regression to More Complex Models
The inputs for linear regression can be:
Original quantitative inputs
Transformation of quantitative inputs
log, exp, square root, square, etc.
Polynomial transformation
= 0 + 1 1 + 2 2 + 3 3
Basis expansions
Dummy coding of categorical inputs
Interactions btw variables
example: 3 = 1 2

This allows use of linear regression techniques


to fit non-linear datasets
- 30 -
1 Introduction

Linear Basis Function Models


Generally,

Basis function
= ()
=0

Typically, 0 = 1 so that 0 acts as a bias

In the simplest case, we use linear basis functions:

- 31 -
1 Introduction

Linear Basis Function Models


Polynomial basis functions
=

Gaussian basis functions


2

=
2 2

Sigmoidal basis functions



=

1
where () =
1+exp()

- 32 -
1 Introduction

Outline
Multiple features

Gradient descent for multiple variables

Feature scaling in gradient descent

Learning rate in gradient descent

Features and polynomial regression

Normal equation

- 33 -
1 Introduction

Normal Equation
A least-square solution to =
2
=

iff

is a solution to the normal equation =


i.e. = 1

- 34 -
1 Introduction

Gradient Descent

- 35 -
1 Introduction

Normal Equation
Normal equation
Method to solve for analytically

If
= 2 + +

Set = =0

Solve for

If +1
1 2
0 , 1 , , =
2 =1

Set = = 0 (for every )

Solve for 0 , 1 , ,
- 36 -
1 Introduction

Normal Equation
If +1
1 2
0 , 1 , , =
=1
2
1
= 2
2

()
0
0 (1) ()
1
(2)
where = 1 , = (design matrix), () = ()
2

() ()

= 1

- 37 -
1 Introduction

Example

=4 ()
Number of Number of Age of home Price ($1000)
Size (feet2) Bedrooms Floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

1 2104 5 1 45 460
= 1 1416 3 2 40 4(+1) = 232
1 1534 3 2 30 315
1 852 2 1 36 178

= 1
- 38 -
1 Introduction

Example
=5
Size (feet2) Number of Number of Age of home Price ($1000)
Bedrooms Floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 3000 4 1 38 540

1 2104 5 1 45 460
1 1416 3 2 40 232
= 1 5(+1) = 315
1534 3 2 30
1 852 2 1 36 178
1 3000 4 1 38 540

- 39 -
= 1
1 Introduction

Examples and Features


examples: (1) , (1) , (2) , (2) , , () , ()

features

()
0
()
(1)
1
() = () +1 , = (2) (+1)
2

() ()

(1)
(2)
= = 1

()
- 40 -
1 Introduction

Examples and Features


examples: (1) , (1) , (2) , (2) , , () , ()

One feature
(1)
1 1
1 (2)
If ()
= () 2 , = 1 1 2
1
()
1 1

- 41 -
1 Introduction

Gradient Descent vs Normal Equation


examples and features

Gradient Descent Normal Equation

Need to choose No need to choose

Needs many iterations No need to iterate

Need to compute 1
( 3 )

Works well even when is large Slow if is very large

- 42 -
1 Introduction

Normal Equation
= 1

What if is non-invertible? (i.e. 1


does not exist)
Singular or degenerate
Pseudo inverse

Singular
Redundant features (linearly dependent)
e.g. 1 = size in feet2
2 = size in m2
1 = 3.28 2 2
Too many features (e.g. )
Delete some features, or use regularization

- 43 -
1 Introduction

References
Andrew Ng, https://www.coursera.org/learn/machine-learning

Eric Eaton, https://www.seas.upenn.edu/~cis519

http://www.holehouse.org/mlclass/04_Logistic_Regression.html

- 44 -

You might also like