Blog Post 0
Creating Data Visualizations
Step 1: Importing tools and data
Before we can begin creating data visualizations in python, we should first import the tools we need, as well as read in the dataset. To do so, we first import pandas, a tool that allows for data analysis and manipulation. Then, we import the data that we would like to create visualizations with and read it with read_csv, which is part of pandas. We will also import numpy and matplotlib, which are needed for creating graphs.
import pandas as pd
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
penguins
import numpy as np
from matplotlib import pyplot as plt
Glancing through the data, we might be interested in seeing how the Flipper Length (mm) and Body Mass (g) of penguins may be correlated. These variables may vary based on Sex as well, so we can color points differently depending on gender. We will create a scatterplot that will plot variables to visualize any potential patterns.
Step 2: Clean data
The next step is to clean the data to ensure we don’t get any unexpected data points in our visualization. To do so, we can select only the variables of interest and remove rows of data that contain NaN or invalid values.
penguins = penguins[["Flipper Length (mm)", "Body Mass (g)", "Sex"]]
penguins = penguins.dropna()
penguins = penguins[penguins["Sex"] != "."]
Step 3: Creating visualizations
Now that our dataframe is cleaned, we can create our scatterplot. We have to first create the plot, then use a for loop to plot each point by sex. Don’t forget to add x- and y-axis labels, as well as a title and legend to convey the information being visualized.
# Create a plot
fig, ax = plt.subplots(1)
# Create a set containing the genders
sexes = set(penguins['Sex'])
# Loop over each gender
for sex in sexes:
a = penguins[penguins['Sex'] == sex]
# Plot values from just that gender, color-coding the point by species
ax.scatter(a['Flipper Length (mm)'], a['Body Mass (g)'], marker = ".", label = sex, alpha = 0.5)
# Set the x-axis label and y-axis label
ax.set(xlabel = "Flipper Length (mm)",
ylabel = "Body Mass (g)")
# Create a title and legend
ax.set(title = "Scatterplot of Flipper Length (mm) and Body Mass (g) based on Sex")
ax.legend()
<matplotlib.legend.Legend at 0x1986b6261c0>