❮ Back to index

Comparing the weather of places I've lived in

Writing proposals? Try Proppy, our new product!

Reading time: ~25 minutes.

I have lived in 4 countries on 3 continents (so far).
When people ask me about these places, the first thing that comes to my mind is the weather. Having lived in these places I knew roughly how they compared but was curious about the exact numbers. What better way than to visualize data than a ipython notebook!

We're going to look at the following four places:

  • Nice, France (my hometown)
  • London, England
  • Montreal, Canada
  • Naha, Japan: in Okinawa, see Google Maps for exact location

I lived in Uruma, not in Naha but Naha being the capital city of Okinawa, the data is coming from there. Okinawa is not really big though so it's ok to use that data.

Getting the data

There are weather stations all over the world but finding their data can be a bit tricky. Also, Nice is a very hard term to google surprisingly enough.
I managed to find data for all the cities but Nice on various local websites when Tom linked TuTiempo.net which aggregates that data in a single place. Great, only one website to scrape!
The script is quite simple and only depends on requests and BeautifulSoup4. You can run it if you want but there are some limitations like requiring folders to exist or having to delete the CSV before running it again. The four CSV and the following notebook are available on this repo.

Looking at the data

You can see the notebook below.

In [1]:
%matplotlib inline

import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn  # make charts prettier and more readable
In [2]:
# Let's load one of the CSV to see what they look like
nice = pd.read_csv('nice.csv')
nice[:5]
Out[2]:
month avg_temp min_temp max_temp humidity rainfall raindays snowdays
0 01-2011 8.6 4.8 12.4 67.0 2.9 5 0
1 02-2011 9.3 7.1 11.1 69.1 2.9 5 0
2 03-2011 11.7 7.8 14.3 67.0 3.9 5 0
3 04-2011 15.6 13.3 18.5 67.4 0.4 1 0
4 05-2011 19.7 16.6 24.8 60.6 0.0 0 0

We can see the shape of our CSV files.
There are 8 columns, all pretty explicit and containing the average for the month in column month except for raindays and snowdays which contains the number of days in that month where it rained and snowed respectively.

Note: I currently count raindays as the number of days where the rainfall is above 5mm (which is completely arbitrary).
If you look at the website I scraped it does contain a column for rain days but some days are marked as rain days when there is no rainfall and the reverse.
Even then it's not really accurate as I'm only interested in quality of life and rain at 4am is not annoying lots of people.

Cool. Let's try plotting something first to see if everything is okay and load the other cities.

In [3]:
_ = nice['avg_temp'].plot(figsize=(15, 5))

While we could continue having one DataFrame per city, it is more convenient to have one DataFrame containing all the data as this allows us to use plotting directly from it like we did above.

In [4]:
locations = ['nice', 'montreal', 'okinawa', 'london']
weather = pd.DataFrame()

for location in locations:
    frame = pd.read_csv('%s.csv' % location)
    # We need to keep track of where it's coming from obviously
    frame['location'] = location
    weather = weather.append(frame)

# Alternative to using slicing
weather.head()
Out[4]:
month avg_temp min_temp max_temp humidity rainfall raindays snowdays location
0 01-2011 8.6 4.8 12.4 67.0 2.9 5 0 nice
1 02-2011 9.3 7.1 11.1 69.1 2.9 5 0 nice
2 03-2011 11.7 7.8 14.3 67.0 3.9 5 0 nice
3 04-2011 15.6 13.3 18.5 67.4 0.4 1 0 nice
4 05-2011 19.7 16.6 24.8 60.6 0.0 0 0 nice

We can also use .describe() on a DataFrame instead of doing .median(), .sum() etc on every single column to have a quick overview of all the variables.

In [5]:
weather.describe()
Out[5]:
avg_temp min_temp max_temp humidity rainfall raindays snowdays
count 192.000000 192.000000 192.000000 192.000000 192.000000 192.000000 192.000000
mean 14.698437 9.718750 19.635937 70.260937 3.507292 4.843750 2.114583
std 9.001405 11.141769 7.269971 7.409749 3.491553 3.241834 5.640857
min -10.000000 -24.900000 1.600000 54.200000 0.000000 0.000000 0.000000
25% 9.150000 4.650000 13.950000 65.300000 1.400000 2.750000 0.000000
50% 16.250000 11.950000 20.950000 69.750000 2.700000 5.000000 0.000000
75% 21.025000 16.600000 25.225000 74.800000 3.900000 7.000000 0.000000
max 29.600000 28.400000 30.700000 87.500000 22.000000 15.000000 25.000000
In [11]:
# Define some styles that we will reuse for all line graphs
styles = {
    'london': 'go-',
    'nice': 'ro-',
    'montreal': 'bo-',
    'okinawa': 'co-',
}

# we define a method since we will need to do that pretty often
def plot_grouped_by(dataframe, column_name):
    """Plots the dataframe grouped by location for the given column"""
    # Need to use the month as the index
    locations = dataframe.set_index('month').groupby('location')
    
    for loc_name, loc in locations:
        loc[column_name].plot(x='month', label=str(loc_name), style=styles[str(loc_name)])


plt.figure(figsize=(16, 8))
ax = plt.subplot(111)

plot_grouped_by(weather, 'avg_temp')

# Yes, I did add the 40 degrees tick just to be able to fit the legend properly
plt.yticks([-15, -10, -5, 0, 5, 10, 15, 20, 25, 30, 40], fontsize=14)
plt.legend(fontsize=14, loc="upper left")
plt.title("Monthly average temperature 2011-2014", fontsize=16)
_ = plt.ylabel("Temperature (celsius)", fontsize=16) 
_ = plt.xlabel("Time", fontsize=16) 

We can make a few observations on this chart:

  • Montreal has the biggest variance in temperature throughout the year: from very cold in winter to quite warm in summer
  • London is disappointingly average and is missing at least 10° on its summer time for my taste
  • Okinawa is hot, even winter is warmer than London's summer most of the time
  • Nice has a nice weather, hot in summer but not too cold in winter

For the next plots, let's focus on 2014 only in order to have an idea on how a year looks like in those cities.

In [7]:
# Making sure we have a datetime first rather than a string
weather['month'] = pd.to_datetime(weather['month'], format="%m-%Y")
start = datetime.date(2014, 1, 1)

# pandas allows all kind of iterator magic
weather_2014 = weather[weather.month >= start]
weather_2014.head()
Out[7]:
month avg_temp min_temp max_temp humidity rainfall raindays snowdays location
36 2014-01-01 9.7 6.4 12.4 71.6 9.2 11 0 nice
37 2014-02-01 9.9 6.8 13.4 69.6 4.6 7 0 nice
38 2014-03-01 12.5 7.4 15.4 62.3 2.7 3 0 nice
39 2014-04-01 15.3 13.2 17.3 69.3 0.3 0 0 nice
40 2014-05-01 17.5 14.4 20.2 64.1 0.6 1 0 nice
In [12]:
# Let's look at temperatures and humidity
plt.figure(figsize=(16, 8))

# this 221 means we want a 2x2 plots display and this is the first one 
# (so upper left)
ax = plt.subplot(221)

plot_grouped_by(weather_2014, 'avg_temp')
plt.yticks([-15, -10, -5, 0, 5, 10, 15, 20, 25, 30, 35], fontsize=14)
plt.legend(fontsize=14, loc="lower center")
plt.title("Monthly average temperature 2014")
_ = plt.ylabel("Temperature (celsius)", fontsize=16) 

ax2 = plt.subplot(222)

plot_grouped_by(weather_2014, 'max_temp')
plt.yticks([0, 10, 20, 30, 40], fontsize=14)
plt.legend(fontsize=14, loc="lower center")
plt.title("Max temperature 2014")
_ = plt.ylabel("Temperature (celsius)", fontsize=16) 

ax3 = plt.subplot(223)

plot_grouped_by(weather_2014, 'min_temp')
plt.yticks([-30, -20, -10, 0, 10, 20, 30], fontsize=14)
plt.legend(fontsize=14, loc="lower center")
plt.title("Min temperature 2014")
_ = plt.ylabel("Temperature (celsius)", fontsize=16) 

ax4 = plt.subplot(224)

plot_grouped_by(weather_2014, 'humidity')
plt.title("Average monthly humidity % in 2014 (legend identical)")
_ = plt.ylabel("Humidity %", fontsize=16) 

Looking at those graphs we can notice a few things:

  • Montreal winter is pretty damn cold but is not really humid, which makes it not THAT bad and Canadians know how to do proper insulation. Summer is quite nice all around, nice temperatures and pretty dry.
  • Nice is good all year round, it sometimes gets below 10° but never reaches 0° while being dry. Having grown up there I feel like I have been spoiled when it comes to weather.
  • London is average but we can see the winter is very humid, which is the reason why I feel colder when it's 0° in London than -15° in Montreal.
  • Okinawa looks really good until you experience that summer humidity. For those that haven't lived in a tropical climate, it means that you are sweating an awful lot very quickly and air con is a necessity.

To finish this notebook, let's have a look at the rain and snow data.

In [9]:
colors = {
    'london': 'green',
    'nice': 'red',
    'montreal': 'blue',
    'okinawa': 'cyan',
}

def bar_plot(ax, column_name):
    weather_2014.set_index(
        ['month', 'location']
    ).unstack().plot(
        ax=ax,
        kind='bar', 
        y=column_name
    )

plt.figure(figsize=(16, 8))
ax = plt.subplot(211)

bar_plot(ax, "raindays")
plt.legend(fontsize=14, loc="best")
plt.title("Rain days per month in 2014")
_ = plt.ylabel("Number of rain days", fontsize=16) 

ax2 = plt.subplot(212)

bar_plot(ax2, "snowdays")
plt.legend(fontsize=14, loc="best")
plt.title("Snow days per month in 2014")
_ = plt.ylabel("Number of snow days", fontsize=16) 

We can see it's raining quite a bit in Okinawa since they have a rainy season (May-June) and a typhoon season (June-November).
Nice is probably the nicest, having close to no rains during summer.
When it comes to snow, Montreal is the only contender as snow in Nice and London is very rare and it hasn't snowed in Okinawa for over 30 years.

While weather is only one of the elements that are necessary to consider when moving (along with salary, atmosphere, friends/family etc.), it is one of the most important thing for me.
If I had to make a ranking of those cities on a weather basis, it would be:

  1. Nice
  2. Montreal
  3. Okinawa/London

Okinawa and London are tied for completely different reasons: one has the legendary British weather and its 2 weeks of summer and the other is so hot and humid that it can be hard to breath at times (but you get amazing sea and beaches).
Now that you have seen how to use Pandas (for those that had never tried it before), it's up to you to finds things to compare. Do share them when you do as it is always interesting to read.