FACETED PLOTS ON PANDAS DATA FRAMES IN PYTHON USING MATPLOTLIB, SEABORN AND GGPLOT

An import part of data analysis is being able plot the data in a way which communicates an aspect of our data set that we are interested in. Often, to be able see what I’m interested in requires some faceted plots broken down by groups in my data set. I thought that today I’d put up a short post on creating faceted plots using three plotting packages.

  • matplotlib – very flexible, but has more of a learning curve than the other two, also ggplot and seaborn use this under the hood. http://matplotlib.org/
  • seaborn – makes great looking plots with much simpler syntax. http://stanford.edu/~mwaskom/software/seaborn/
  • ggplot – Anyone coming from a background in R will be familiar with this package. Also creates plots in with a very concise syntax. The guys over at yhat have ported this plotting package to python, it is relatively new but they have open sourced it and seem to be getting more features all the time. https://github.com/yhat/ggplot

So before we can plot anything we need some data! The data I’ll be using today is available from a fivethirtyeight github repo here: https://github.com/fivethirtyeight/uber-tlc-foil-response/tree/master/uber-trip-data. The data is a collection of Uber pick ups in New York City from 2014, I am using the Sept-2014 data set. I thought it would be interesting to group the data by day and hour and take a look at the total number of rides throughout the day for each day in Sept-2014. One other quick note, all of the plots are bit small since they were plotted in an ipython notebook which was pasted into this post for convenience, if you plot them your self from the terminal you will be able to resize them to a more readable size.

Imports/Packages needed

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from ggplot import *
import seaborn as sns
# line below allows inline plotting in ipython notebooks
%matplotlib inline 
plt.rcParams['figure.figsize'] = (20.0, 8.0) # make plots bigger by default

Importing the data

First I want to load the data, and extract a day for pick ups and hour of the day.

parser =  lambda x: datetime.strptime(x,'%m/%d/%Y %H:%M:%S') ## needed for pandas to recognize this date format

uber = pd.read_csv('uber-raw-data-sep14.csv', parse_dates=['Date/Time'], date_parser=parser)
uber['Day'] = uber['Date/Time'].apply(lambda x: x.day)
uber['Hour'] = uber['Date/Time'].apply(lambda x: x.hour)
uber.head()
Date/Time Lat Lon Base Day Hour
0 2014-09-01 00:01:00 40.2201 -74.0021 B02512 1 0
1 2014-09-01 00:01:00 40.7500 -74.0027 B02512 1 0
2 2014-09-01 00:03:00 40.7559 -73.9864 B02512 1 0
3 2014-09-01 00:06:00 40.7450 -73.9889 B02512 1 0
4 2014-09-01 00:11:00 40.8145 -73.9444 B02512 1 0

Grouping the data

I am interested in looking at the number of uber pick ups each hour of each day, to get at that information I need to group and aggregate the data

grouped = uber.groupby(['Day','Hour'],as_index = False).aggregate(len)
grouped = grouped[['Day','Hour','Lat']]
grouped.columns = ['Day','Hour','Count']
#Take a look at how the data looks now 
grouped.head()
Day Hour Count
0 1 0 699
1 1 1 490
2 1 2 363
3 1 3 333
4 1 4 261

Plotting using matplotlib (functionality built into Pandas)

col_num = 7 # one for each day of the week
row_num = 5 # one/ week
weekend_days = [6, 7, 13, 14, 20, 21, 27, 28] # these are the dates of weekend days in Sept 2014
color_dict = { True  : 'red', False : 'blue'} # give weekend days a different color, just for fun
row, col = 0, 1 
fig, axes = plt.subplots(nrows=row_num, ncols=col_num, sharex= True, sharey= True)
for name, group in grouped.groupby('Day'):
    group.plot('Hour', 'Count', ax=axes[row,col], legend=False, color = color_dict[name in weekend_days])
    axes[row,col].set_ylim(0, 3500) # set y limit
    axes[row,col].set_ylabel('Rides/Hr') # set y label
    axes[row,col].annotate('9/'+str(name), xy= (1, 3000)) # write date in corner of each plot
    col += 1
    if col%col_num == 0: 
        row += 1
        col  = 0 

Plotting using ggplot (still matplotlib under the hood)

ggplot(grouped, aes(x='Hour',y='Count')) +\
geom_line() +\
facet_wrap('Day', ncol = 7, nrow = 5, scales = 'fixed')

Plotting using Seaborn (still matplotlib under the hood)

grid = sns.FacetGrid(grouped, col="Day", col_wrap=7)
grid.map(plt.plot,'Hour','Count')

Alright, that’s it for today. Hope you guys find these examples useful. It’s much easier to quickly look at things using ggplot and seaborn, but matplotlib allows you to specify very precisely what you would like. Also, it is worthwhile getting comfortable with matplotlib because the other packages implement it, so if you are using ggplot or seaborn but want to change something about your plot, you can alter the plots with the usual matplotlib syntax.

 

This entry was posted in Python. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Your email address will not be published. Required fields are marked *