Import Libraries¶

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import datetime

Import Data and Preparation¶

I want to visualise the last 12 months of data from August 2017 to August 2018.

df = pd.read_csv('../input/Melbourne_housing_FULL.csv')
df = df[df.Price.notnull()]
df.head()

df['Date'] = df['Date'].apply(lambda i : datetime.datetime.strptime(i, '%d/%m/%Y'))
begin = datetime.datetime.strptime('31/8/2017', '%d/%m/%Y').date()
end = datetime.datetime.strptime('1/9/2018', '%d/%m/%Y').date()
df = df[(df.Date> begin) & (df.Date< end)]
df['Date'].head()

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:6: FutureWarning: Comparing Series of datetimes with 'datetime.date'.  Currently, the
'datetime.date' is coerced to a datetime. In the future pandas will
not coerce, and a TypeError will be raised. To retain the current
behavior, convert the 'datetime.date' to a datetime with
'pd.Timestamp'.

19741   2017-09-03
19742   2017-09-03
19744   2017-09-03
19745   2017-09-03
19746   2017-09-03
Name: Date, dtype: datetime64[ns]

15 Most Expensive Suburbs¶

Top15 = pd.pivot_table(df, values = 'Price', index = 'Suburb', aggfunc = 'mean').reset_index()
Top15 = Top15.sort_values(by = 'Price', ascending = False)[:15]

plt.figure(figsize = (20,10))
sns.set(font_scale=2)
sns.barplot(y = 'Price', x = 'Suburb', data = Top15)

plt.xticks(rotation=90)
plt.xlabel('Suburb')
plt.ylabel('Average Price of Property Sold')
plt.title('Most Expensive Suburbs in Melbourne Aug2017 to Aug2018')

Text(0.5,1,'Most Expensive Suburbs in Melbourne Aug2017 to Aug2018')

15 Most Affordable Suburbs¶

Bottom15 = pd.pivot_table(df, values = 'Price', index = 'Suburb', aggfunc = 'mean').reset_index()
Bottom15 = Bottom15.sort_values(by = 'Price', ascending = True)[:15]

plt.figure(figsize = (20,10))
sns.set(font_scale=2)
sns.barplot(y = 'Price', x = 'Suburb', data = Bottom15)

plt.xticks(rotation=90)
plt.xlabel('Suburb')
plt.ylabel('Average Price of Property Sold')
plt.title('Most Affordable Suburbs in Melbourne Aug2017 to Aug2018')

Text(0.5,1,'Most Affordable Suburbs in Melbourne Aug2017 to Aug2018')

Plotting Houses by Price Category and Location¶

Before we create the plot, we first need to create the pricing categories.
Let's check out the pricing distribution of Prices first:

df['Price'].describe()

count    1.088600e+04
mean     1.040665e+06
std      6.370919e+05
min      1.120000e+05
25%      6.400000e+05
50%      8.600000e+05
75%      1.260000e+06
max      1.120000e+07
Name: Price, dtype: float64

Let's create Pricing Category of <250k, 250k-500k, 500k-800k, 800k -1M, 1M-1.5M, 1.5-2M, 2M-3M, and >3M:

def priceCat(i):
    if i<250000:
        cat = 'Sub 250k'
    elif i>250000 and i<500000:
        cat = '250k to 500k'
    elif i>500000 and i<800000:
        cat = '500k to 800k'
    elif i>800000 and i<1000000:
        cat = '800k to 1M'
    elif i>1000000 and i<1500000:
        cat = '1M to 1.5M'
    elif i>1500000 and i<2000000:
        cat = '1.5M to 2M'
    elif i>2000000 and i<3000000:
        cat = '2M to 3M'
    else:
        cat = '3M+'
    return cat

df['PriceCat']= df['Price'].apply(lambda x: priceCat(x))
df['PriceCat'].head(15)

19741      1M to 1.5M
19742    250k to 500k
19744      1.5M to 2M
19745      800k to 1M
19746      1.5M to 2M
19747      800k to 1M
19748      800k to 1M
19749    500k to 800k
19750      1M to 1.5M
19751             3M+
19752    250k to 500k
19753      800k to 1M
19754      1M to 1.5M
19755      1M to 1.5M
19756      800k to 1M
Name: PriceCat, dtype: object

The Scatter Plot¶

df=df.sort_values(by = 'Price', ascending = False)
sns.set(font_scale=1)
sns.lmplot(x='Longtitude', y='Lattitude', data=df[df.Lattitude.notnull()],
           fit_reg=False, 
           hue='PriceCat',
          palette = 'rainbow')
plt.figure(figsize = (20,20))

<Figure size 2160x2160 with 0 Axes>

<Figure size 2160x2160 with 0 Axes>

Data Econ

Saturday, 29 September 2018

Melbourne Residential Property Prices - Visualised