Data Visualisation Mini Series (I): Scatterplots

Scatterplots are used to display values from two variables, each variable along one of the axes, allowing us to detect if there is any correlation or potential relationship between them as well as finding if there are any outliers in the data.

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.available# and use this to choose the one you like
plt.style.use('ggplot')
iris = pd.read_csv('Iris.csv')

Seaborn

Use the lmplot function

# with this code, you'll get a simple scatter plot of x and y
sns.lmplot(x="PetalLengthCm", y="PetalWidthCm", data=iris)
colours = ['#9B59B6', '#F4D03F', '#58D68D']sns.lmplot(x='PetalLengthCm', # X-axis name
y='PetalWidthCm', # Y-axis name
data=iris,
fit_reg=False, # set to True if you need regression lines
hue='Species', # one colour per iris species
scatter_kws={"s":100},
size=8,
palette=colours)
# set graph title
plt.title("Petal Length vs. Petal Width by Species", size=20)
# set axis labels
plt.xlabel("Petal Length", size=16)
plt.ylabel("Petal Width", size=16)
# change font size of axis ticks
plt.tick_params(labelsize=14)
# limit Y axis to start from 0
plt.ylim(0, 3)
# before, I was successfully saving the image in my computer but for # some reason the title was cropped. Adding the bbox_inches='tight' # parameter will help you solve thatplt.savefig('seaborn.jpeg', bbox_inches='tight')
Image for post
Image for post

Matplotlib

Use the scatter function:

# with this code, you'll get a simple scatter plot of x and y
plt.scatter(iris.PetalLengthCm, iris.PetalWidthCm)
fig, ax = plt.subplots(1, 1, figsize=(8, 8))# because matplotlib doesn't support the (very useful) "hue"
# parameter from seaborn, in order to replicate the above
# scatterplot we'll have to do a for loop and plot the
# data points from each species one at a time
for i, s in enumerate(iris.Species.unique()):
tmp = iris.loc[iris.Species == s, :]
ax.scatter(tmp.PetalLengthCm,
tmp.PetalWidthCm,
label=s,
s=100,
c=colours[i])
# set legend location and font size
ax.legend(loc=2, fontsize=12)
# set graph title
ax.set_title("Petal Length vs. Petal Width by Species", size=20)
# set axis labels
ax.set_xlabel("Petal Length", size=16)
ax.set_ylabel("Petal Width", size=16)
# change font size of axis ticks
ax.tick_params(labelsize=14)
# limit Y axis to start from 0
ax.set_ylim(0, 3)
# save the graph as a jpeg
plt.savefig('matplotlib.jpeg')
Image for post
Image for post

Tableau

  • Place the X-measure (Petal Length) in the Columns shelf and the Y-measure (Petal Width) in the Rows shelf. Note that both need to be measures. Tableau will first aggregate the values and give you a one-mark scatterplot:
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Tales about data, statistics, machine learning, visualisation, and much more. By Adrià Luz (@adrialuz) and Sara Gaspar (@sargaspar).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store