Data Visualisation Mini Series (I): Scatterplots

by Sara Gaspar

Adrià Luz
4 min readNov 27, 2017

Scatterplots are used to display values from two variables, each variable along one of the axes, allowing us to detect if there is any correlation or potential relationship between them as well as finding if there are any outliers in the data.

Scatterplots are great to pair numerical variables and see if there exists any positive or negative relationship between them. However, while it might be tempting to assume that one variable impacts the other, remember that correlation doesn’t imply causation (do refer to this link if in doubt) as there might be other factors — also called confounding variables — that could be affecting your results.

This post is the first episode of my data visualisation mini series — a place where you’ll find guides showing you how to quickly create graphs using the matplotlib and seaborn libraries or using Tableau.

Setting the environment:

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Fancy a particular style? Check the ones available:

plt.style.available# and use this to choose the one you like
plt.style.use('ggplot')

Read the iris data set (your data!):

iris = pd.read_csv('Iris.csv')

Seaborn

Use the lmplot function

# with this code, you'll get a simple scatter plot of x and y
sns.lmplot(x="PetalLengthCm", y="PetalWidthCm", data=iris)

Example of a scatterplot with one colour per category (& figure aesthetics):

colours = ['#9B59B6', '#F4D03F', '#58D68D']sns.lmplot(x='PetalLengthCm', # X-axis name
y='PetalWidthCm', # Y-axis name
data=iris,
fit_reg=False, # set to True if you need regression lines
hue='Species', # one colour per iris species
scatter_kws={"s":100},
size=8,
palette=colours)
# set graph title
plt.title("Petal Length vs. Petal Width by Species", size=20)
# set axis labels
plt.xlabel("Petal Length", size=16)
plt.ylabel("Petal Width", size=16)
# change font size of axis ticks
plt.tick_params(labelsize=14)
# limit Y axis to start from 0
plt.ylim(0, 3)
# before, I was successfully saving the image in my computer but for # some reason the title was cropped. Adding the bbox_inches='tight' # parameter will help you solve thatplt.savefig('seaborn.jpeg', bbox_inches='tight')

Matplotlib

Use the scatter function:

# with this code, you'll get a simple scatter plot of x and y
plt.scatter(iris.PetalLengthCm, iris.PetalWidthCm)

Example of a scatterplot with one colour per category (& figure aesthetics):

fig, ax = plt.subplots(1, 1, figsize=(8, 8))# because matplotlib doesn't support the (very useful) "hue"
# parameter from seaborn, in order to replicate the above
# scatterplot we'll have to do a for loop and plot the
# data points from each species one at a time
for i, s in enumerate(iris.Species.unique()):
tmp = iris.loc[iris.Species == s, :]
ax.scatter(tmp.PetalLengthCm,
tmp.PetalWidthCm,
label=s,
s=100,
c=colours[i])
# set legend location and font size
ax.legend(loc=2, fontsize=12)
# set graph title
ax.set_title("Petal Length vs. Petal Width by Species", size=20)
# set axis labels
ax.set_xlabel("Petal Length", size=16)
ax.set_ylabel("Petal Width", size=16)
# change font size of axis ticks
ax.tick_params(labelsize=14)
# limit Y axis to start from 0
ax.set_ylim(0, 3)
# save the graph as a jpeg
plt.savefig('matplotlib.jpeg')

Tableau

  • Place the X-measure (Petal Length) in the Columns shelf and the Y-measure (Petal Width) in the Rows shelf. Note that both need to be measures. Tableau will first aggregate the values and give you a one-mark scatterplot:

Drag the Id’s to Detail on the Marks card, and it will de-aggregate all the measures. Finally, drag the Species (your categories) to Colours:

Although this might not be the most relevant use case, Tableau offer a Trend Lines feature: If you’d like to add trend lines, go to the Analytics pane and drag the Trend Line model to the view. Alternatively, you’ll find trend lines in the top menu under Analysis.

On the next episodes of this mini series I’ll be explaining how to plot other types of graphs. Stay tuned!

--

--

Adrià Luz
Adrià Luz

Written by Adrià Luz

Tales about data, statistics, machine learning, visualisation, and much more. By Adrià Luz (@adrialuz) and Sara Gaspar (@sargaspar).

No responses yet