Housing Prices Project EEB

Uploaded by

Erik Bebernes

0% found this document useful (0 votes)

17 views9 pages

A regression analysis predicting housing prices in Ames, Iowa

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

A regression analysis predicting housing prices in Ames, Iowa

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

17 views9 pages

Housing Prices Project EEB

Uploaded by

Erik Bebernes

A regression analysis predicting housing prices in Ames, Iowa

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

!

!
!
!
!
!
!
Predicting Prices in the Iowa Housing
Market (Regularized Linear
Regression)

Erik Bebernes
Introduction

This project asks a common question in the field of predictive analyticswhat are houses worth?
Identifying the true price of a home is important in preventing a housing bubble, such as the one
that plagued our country in 2008 that ultimately lead to a recession. The data Im using comes
from Kaggle, and looks specifically at houses in Ames, Iowa. There are 81 variables, with the
control being Sale Price. I worked on a problem similar to this as an undergraduate student in
an econometrics class, and although I really enjoyed it, I hadnt the slightest clue what I was
doing. Now that I am more knowledgeable when it comes to multi-regression analysis I should
be able to come up with some fairly accurate predictions. Before I begin, here is a look at the 81
variables Ill be working with.

My plan of attack on this project is as follows:

1.)! Identify any missing data (both missing at random and not at random) and impute new
data accordingly.
2.)! Remove any outliers to reduce model complexity and avoid overfitting.
3.)! Run a multi-regression model, using backward selection until the p-value for model as a
whole is below .05.
4.)! Try a regularized linear model.

Identifying Missing Data and Cleaning It

The first thing I like to do in a lot of my projects is to run a missmap on the datasets to see how
much of the data is NA.
A handful of variables are nearly completely missing, lets see what they are and why.

The variables with all of the missing values are Alley (type of alley access), PoolQC (pool
quality), FireplaceQu (fireplace quality), Fence (fence quality), Lot Frontage (linear feet
of street connected to property) and MiscFeature (miscellaneous feature not covered in other
categories). The descriptions of these variables make it obvious that the data is not missing at
random, because they are conditional to whether or not the house has that feature to begin with.
This can be said for all of these variables. Look at all of the missing variables related to garages
and basementsthese are the houses that dont have garages and basements. Its also worth
noting that the amount of NAs is equal across similar categories (i.e., all of the garage variables
have 81 missing values). There is an easy fix for this. Im going to replace all NAs for factor
variables with none and NAs for all numeric variables with 0.
Removing Outliers
By making scatterplots of the numeric variables against Sale Price Ill be able to identify any
outliers and remove them from the dataset. This will simplify the model and reduce any
overfitting when it comes to making the prediction.
Multi-Regression Model
In developing my linear model, I used a backward selection method, where I started by including
all of the independent variables and gradually the insignificant ones (where there was a p-value
greater than .05).

The adjusted R-squared (accounts for more error due to an abundance of variables) is .8718,
meaning 87% of the error in the dataset can be explained by the model. The model as a whole
has a p-value of < 2.2e-16, making it significant. Time to make my prediction and see how it
stands up on the kaggle rankings.

After submitting my prediction, I was only in the 13th percentile of most accurate. This is due to
the fact that there is a high variable to observation ratio, leading to overfitting. To account for
this I will attempt to make a regularized linear model using the caret package in R, but in order
to do so I need to convert factors of two into dummy variables.
Regularized Linear Model

Regularizing my model greatly improved my accuracy (now Im in the 67th percentile on

kaggles leaderboard).

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5784)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (98)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (838)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (537)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (271)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (890)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (399)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (234)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1888)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (587)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (265)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (231)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (72)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (474)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (599)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (137)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (344)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (806)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2219)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (440)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Rating: 4 out of 5 stars
4/5 (1090)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2409)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1015)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1711)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1800)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2322)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4609)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (119)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
Rating: 3.5 out of 5 stars
3.5/5 (1937)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (3811)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (791)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (104)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2099)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4193)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1929)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (821)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1103)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
Rating: 3.5 out of 5 stars
3.5/5 (104)
Instructor's Manual for Multivariate Data Analysis
Document18 pages
Instructor's Manual for Multivariate Data Analysis
yonpurba
No ratings yet
The Music
Document2 pages
The Music
Oscar Laranjo
No ratings yet
App2 - Entreprenuership: Input 13 The 4M's of Operation
Document4 pages
App2 - Entreprenuership: Input 13 The 4M's of Operation
Rhina May
No ratings yet
Class Schedule Grade 4 Bonifacio
Document1 page
Class Schedule Grade 4 Bonifacio
Guia Marie Diaz Brigino
No ratings yet
Human Flourishing and the Role of Science
Document3 pages
Human Flourishing and the Role of Science
Thom Steve Ty
100% (1)
Sir Mokshagundam Visweswaraiah
Document15 pages
Sir Mokshagundam Visweswaraiah
Vizag Roads
100% (5)
2-Predicting The Role of Emotional and Behavioral Problems On Delinquent Tendencies in Adolescents
Document21 pages
2-Predicting The Role of Emotional and Behavioral Problems On Delinquent Tendencies in Adolescents
Clinical and Counselling Psychology Review
No ratings yet
New Year New Life by Athhony Robbins
Document6 pages
New Year New Life by Athhony Robbins
Tygas
100% (1)
SIP Tool
Document4 pages
SIP Tool
Sharie Arellano
No ratings yet
Importance of Case Studies in MGMT Edu (Kanik Gupta)
Document2 pages
Importance of Case Studies in MGMT Edu (Kanik Gupta)
Kanik Gupta
No ratings yet
Legally Blonde Beat Sheet
Document3 pages
Legally Blonde Beat Sheet
Samuel Jerome
100% (10)
Template Project Handover 2012
Document6 pages
Template Project Handover 2012
Jonah Scott
No ratings yet
Daily Lesson LOG Monday Tuesday Wednesday Thursday Friday
Document6 pages
Daily Lesson LOG Monday Tuesday Wednesday Thursday Friday
Nenbon Natividad
No ratings yet
Berg HEJ
Document19 pages
Berg HEJ
Ronie mar Del rosario
No ratings yet
Result of X N Xii 2013
Document13 pages
Result of X N Xii 2013
AbhijitChatterjee
No ratings yet
Vygotsky 1978
Document9 pages
Vygotsky 1978
Anon TJStudent
No ratings yet
How To Complete TSA Charts and Search The NOC Career Handbook For Potentially Suitable Occupations
Document9 pages
How To Complete TSA Charts and Search The NOC Career Handbook For Potentially Suitable Occupations
John F. Lepore
100% (1)
Lucas, Marinella J
Document1 page
Lucas, Marinella J
Marinella Lucas
No ratings yet
QB BC
Document4 pages
QB BC
Madhuri Aggarwal
No ratings yet
LISTENING TEST DIAGNOSTIC
Document4 pages
LISTENING TEST DIAGNOSTIC
hhhjh
No ratings yet
Homework - Unit 2b
Document2 pages
Homework - Unit 2b
Ricardo Sousa
No ratings yet
Relationship Among Theory, Research and Practice
Document16 pages
Relationship Among Theory, Research and Practice
Lourdes Mercado
0% (1)
3RD Quarter Numeracy Test
Document4 pages
3RD Quarter Numeracy Test
Jennyfer Tangkib
No ratings yet
Volleyball Study Guide
Document3 pages
Volleyball Study Guide
api-317256154
No ratings yet
Curriculum Vitae - Emily Daina Šaras
Document17 pages
Curriculum Vitae - Emily Daina Šaras
emilydainasaras
No ratings yet
Motivation LetterSocial
Document2 pages
Motivation LetterSocial
Irfan Rahadian Sudiyana
No ratings yet
Political Socialisation
Document6 pages
Political Socialisation
Divyank Surum
No ratings yet
Grade 3 - Password Power-Up - Lesson 2
Document14 pages
Grade 3 - Password Power-Up - Lesson 2
Sherel Jade Calixto
No ratings yet
Unit 11 - Strategic Quality and Systems Management
Document4 pages
Unit 11 - Strategic Quality and Systems Management
EYmran RExa XaYdi
100% (1)
Transportation Law Syllabus KB Fabila 1st Sem 2020
Document5 pages
Transportation Law Syllabus KB Fabila 1st Sem 2020
Jay Labb
No ratings yet