This is a continuation of the first part where I visualised the property price data of Melbourne (link). This time I used the full dataset with prices and property data dating from Jan 2016 to Aug 2018 to create a model that is capable of predicting property prices in Melbourne.
Here is what the data look like:
There are 27247 property transactions with 21 features, including price, in the dataset. In my model, I dropped "Lattitude" and "Longtitude" [sic] since there are so much missing data. I created unique street names by extracting them from the addresses and combining them with suburb names. I also imputed some essential missing data using means.
I separated the data into training and testing sets. I used a gradient boosting algorithm to model the prices. Here is a good tutorial on the algorithm - Link. The algorithm is released by Microsoft and is available in both Python and R. This is what the predictions look like against the prices of the testing set:
I also plotted the features according to their importance/prediction power:

The most important features affecting house prices in Melbourne are:
Here is what the data look like:
The most important features affecting house prices in Melbourne are:
- Distance from CBD
- Postcode
- Dwelling type
- Number of rooms
For me, the most suprising part is that the number of rooms in a property has a higher impact on the price than landsize or built area.
To see how I did everything step by step, please refer to my Jupyter Notebook on Kaggle:https://www.kaggle.com/wlsamchen/melbourne-house-price-modelling-light-gbm
No comments:
Post a Comment