I decided to take a look at what the average (mean) prices are in the most expensive and most affordable suburbs are like:
As we can see from above, the most expensive suburbs are 4 to 5 times pricier than the most affordable suburbs.
Next, I decided to create some sort of a map that shows property prices. I divided the prices into 8 categories and plotted them by their latitude and longitudes (for the properties with coordinates data). The resulting scatter plot is a map of Melbourne:
Note: Latitude and Longitude are spelt wrong because the data columns were named as such.
The CBD is a at approximate the middle of the plot (-37.8, 145.0). As you can see, there is a clear divide: properties to the east and sout east of the CBD are pricier than those of the west and north. Properties closer to the CBD are also worth more.
There will be a separate post on predicting the prices of properties and the most important features affecting property prices in Melbourne.
Full Code Below:
Import Libraries¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import datetime
Import Data and Preparation¶
I want to visualise the last 12 months of data from August 2017 to August 2018.
In [7]:
df = pd.read_csv('../input/Melbourne_housing_FULL.csv')
df = df[df.Price.notnull()]
df.head()
df['Date'] = df['Date'].apply(lambda i : datetime.datetime.strptime(i, '%d/%m/%Y'))
begin = datetime.datetime.strptime('31/8/2017', '%d/%m/%Y').date()
end = datetime.datetime.strptime('1/9/2018', '%d/%m/%Y').date()
df = df[(df.Date> begin) & (df.Date< end)]
df['Date'].head()
Out[7]:
15 Most Expensive Suburbs¶
In [18]:
Top15 = pd.pivot_table(df, values = 'Price', index = 'Suburb', aggfunc = 'mean').reset_index()
Top15 = Top15.sort_values(by = 'Price', ascending = False)[:15]
plt.figure(figsize = (20,10))
sns.set(font_scale=2)
sns.barplot(y = 'Price', x = 'Suburb', data = Top15)
plt.xticks(rotation=90)
plt.xlabel('Suburb')
plt.ylabel('Average Price of Property Sold')
plt.title('Most Expensive Suburbs in Melbourne Aug2017 to Aug2018')
Out[18]:
15 Most Affordable Suburbs¶
In [19]:
Bottom15 = pd.pivot_table(df, values = 'Price', index = 'Suburb', aggfunc = 'mean').reset_index()
Bottom15 = Bottom15.sort_values(by = 'Price', ascending = True)[:15]
plt.figure(figsize = (20,10))
sns.set(font_scale=2)
sns.barplot(y = 'Price', x = 'Suburb', data = Bottom15)
plt.xticks(rotation=90)
plt.xlabel('Suburb')
plt.ylabel('Average Price of Property Sold')
plt.title('Most Affordable Suburbs in Melbourne Aug2017 to Aug2018')
Out[19]:
Plotting Houses by Price Category and Location¶
Before we create the plot, we first need to create the pricing categories.
Let's check out the pricing distribution of Prices first:
Let's check out the pricing distribution of Prices first:
In [20]:
df['Price'].describe()
Out[20]:
Let's create Pricing Category of <250k, 250k-500k, 500k-800k, 800k -1M, 1M-1.5M, 1.5-2M, 2M-3M, and >3M:
In [26]:
def priceCat(i):
if i<250000:
cat = 'Sub 250k'
elif i>250000 and i<500000:
cat = '250k to 500k'
elif i>500000 and i<800000:
cat = '500k to 800k'
elif i>800000 and i<1000000:
cat = '800k to 1M'
elif i>1000000 and i<1500000:
cat = '1M to 1.5M'
elif i>1500000 and i<2000000:
cat = '1.5M to 2M'
elif i>2000000 and i<3000000:
cat = '2M to 3M'
else:
cat = '3M+'
return cat
df['PriceCat']= df['Price'].apply(lambda x: priceCat(x))
df['PriceCat'].head(15)
Out[26]:
The Scatter Plot¶
In [47]:
df=df.sort_values(by = 'Price', ascending = False)
sns.set(font_scale=1)
sns.lmplot(x='Longtitude', y='Lattitude', data=df[df.Lattitude.notnull()],
fit_reg=False,
hue='PriceCat',
palette = 'rainbow')
plt.figure(figsize = (20,20))
Out[47]: