2. Primary Types and Attack, Defense, & Speed
Now, let’s examine the breakdown of primary types and generations, in
addition to attack, defense, and speed statistics. Again, you may use
either polars or pandas for data
manipulation.
Now, let’s group by the primary type, and examine the average values of each of attack, defense, and speed.
a Mean Values by Primary Type (10 pts)
Create a dataframe that groups by the primary type and computes the mean attack, defense, and speed for each primary type.
Hints
- Remember there are shortcuts for aggregating multiple columns with the same aggregation function in both polars and pandas.
b. Melt/Unpivot the Dataframe (5 pts)
Next, melt/unpivot this data to put it into a long format with three
columns: primary_type, variable, and
value, where variable is one of
attack, defense, or speed, and
value is the corresponding mean value.
Hints
- Check the documentation if you do not remember which columns to identify in the melt/unpivot operation.
c. Bar Chart (10 pts)
Using seaborn, matplotlib, or via pandas’ plotting routines, create a grouped bar chart that shows the mean attack, defense, and speed for each primary type. Set the figure size wider to improve the visualization. Include a legend and make sure the axes are properly labeled.
With either pandas or polars, seaborn’s barplot is the most straightforward solution. Use the penguins grouped by sex as a guide.
If you are using pandas, you can use its built-in plot method (which uses matplotlib) to create the grouped bar chart, but you will want to go back to the unmelted version of the dataset. Make sure the rows are primary types and the columns are the three statistics.
If you use polars, you can use to_pandas to convert to
pandas and use its plot facilities as described above, but you can do
this directly via matplotlib, although this is much more involved. You
will receive extra credit for doing it this way. The easiest way to draw
a grouped bar chart is to draw three different bar charts, one for each
statistic, but with proper offsets for the x positions. Your loop body
should be a call to pyplot. The offset calculation is a bit tricky so
here is a starting structure for how this works:
import matplotlib.pyplot as plt
# convert the category to a number for offsets
df = df.with_columns(
pl.col("primary_type").cast(pl.Categorical).cast(pl.UInt32).alias("primary_num")
)
for i, col in enumerate(["attack", "defense", "speed"]):
df_offset = df.with_columns((pl.col("primary_num") * 2 + (i - 1) / 2).alias("offset"))
plt.bar(
...,
data=df_offset
)
# other calls for legend, labels
Hints:
- You can use the
widthproperty to control the amount of whitespace between bars.
d. Scatterplot (15 pts)
From the bar chart, we think there may be a negative correlation between attack and defense averages versus speed averages. Using the grouped data from Part a, and then add the attack and defense means together to create a new column. Using matplotlib directly (seaborn and pandas are not allowed), plot this new column versus the speed column. Label the axes appropriately. You should see one significant outlier with respect to the negative correlation. Using values from the visualization, write a filter operation to identify the primary type of this outlier. (You should be able to estimate the values from the plot axes to construct a filter.)
Hints:
- The data kwarg version of the matplotlib plotting functions may be easiest to use with dataframes.
- Remember the
with_columnsandassignmethods forpolarsandpandas, respectively.
e. [CS503 Only] Scatter Matrix (15 pts)
Now, let’s use all three attributes and use altair to create a scatter matrix. A scatter matrix is a bunch of scatterplots, one for each pair of variables. Begin by creating a scatterplot that compares just attack and speed. Once you have this, change the x and y values to use the repeat capabilities of altair, and set the repeat over the speed, attack, and defense attributes.
Hints:
- Altair’s example gallery is very helpful. See the scatter_matrix example.