Iris Problem¶

The Iris Flower Dataset contains measurements on three species of iris flowers

import pandas as pd

# Fetch the data (1)
iris = pd.read_csv("https://raw.githubusercontent.com/practiceprobs/datasets/main/iris/iris.csv")

# Inspect the first 5 rows
iris.head()
#    sepal_length  sepal_width  petal_length  petal_width      species
# 0           5.1          3.5           1.4          0.2  Iris-setosa
# 1           4.9          3.0           1.4          0.2  Iris-setosa
# 2           4.7          3.2           1.3          0.2  Iris-setosa
# 3           4.6          3.1           1.5          0.2  Iris-setosa
# 4           5.0          3.6           1.4          0.2  Iris-setosa

# Inspect the last 5 rows
iris.tail()
#      sepal_length  sepal_width  petal_length  petal_width         species
# 145           6.7          3.0           5.2          2.3  Iris-virginica
# 146           6.3          2.5           5.0          1.9  Iris-virginica
# 147           6.5          3.0           5.2          2.0  Iris-virginica
# 148           6.2          3.4           5.4          2.3  Iris-virginica
# 149           5.9          3.0           5.1          1.8  Iris-virginica

# Inspect the species values
iris.species.value_counts()
# Iris-setosa        50
# Iris-versicolor    50
# Iris-virginica     50
# Name: species, dtype: int64

Check out github.com/practiceprobs/datasets for a collection of easily accessible datasets hosted on Github!

Suppose we want to build a model to predict petal_length. It'd be useful to visualize petal_length versus the other three continuous variables: sepal_length, sepal_width, and petal_width, each colored by species. Do that, mimicking the plot below.

Notes

the plot style used in this example is ggplot.
the Colormap used in this example is brg.

Show the plot

Try with Google Colab