Blog Post 0

Creating Data Visualizations

Step 1: Importing tools and data

Before we can begin creating data visualizations in python, we should first import the tools we need, as well as read in the dataset. To do so, we first import pandas, a tool that allows for data analysis and manipulation. Then, we import the data that we would like to create visualizations with and read it with read_csv, which is part of pandas. We will also import numpy and matplotlib, which are needed for creating graphs.

import pandas as pd
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
penguins

import numpy as np
from matplotlib import pyplot as plt

Glancing through the data, we might be interested in seeing how the Flipper Length (mm) and Body Mass (g) of penguins may be correlated. These variables may vary based on Sex as well, so we can color points differently depending on gender. We will create a scatterplot that will plot variables to visualize any potential patterns.

Step 2: Clean data

The next step is to clean the data to ensure we don’t get any unexpected data points in our visualization. To do so, we can select only the variables of interest and remove rows of data that contain NaN or invalid values.

penguins = penguins[["Flipper Length (mm)", "Body Mass (g)", "Sex"]]
penguins = penguins.dropna()
penguins = penguins[penguins["Sex"] != "."]

Step 3: Creating visualizations

Now that our dataframe is cleaned, we can create our scatterplot. We have to first create the plot, then use a for loop to plot each point by sex. Don’t forget to add x- and y-axis labels, as well as a title and legend to convey the information being visualized.

# Create a plot
fig, ax = plt.subplots(1)

# Create a set containing the genders
sexes = set(penguins['Sex'])

# Loop over each gender
for sex in sexes:
    a = penguins[penguins['Sex'] == sex]
    
    #  Plot values from just that gender, color-coding the point by species
    ax.scatter(a['Flipper Length (mm)'], a['Body Mass (g)'], marker = ".", label = sex, alpha = 0.5)
    
# Set the x-axis label and y-axis label
ax.set(xlabel = "Flipper Length (mm)",
       ylabel = "Body Mass (g)")

# Create a title and legend
ax.set(title = "Scatterplot of Flipper Length (mm) and Body Mass (g) based on Sex")
ax.legend()

<matplotlib.legend.Legend at 0x1986b6261c0>

png

Written on January 14, 2022