Strange Times | #savestanfordmvb

July 30, 2020
ggplot2 tidyverse savestanfordmvb

Welcome!

First of all, thanks for stopping by. My hope is to post here on a somewhat regular basis, but I can’t make any promises. I guess you’ll just have to keep checking back for updates. Second, I hope you’re doing alright in these strange, coronavirus times.

I decided to start this blog for a number of reasons, the most personal of which is to start a journey to the “Kinda Good Island” (in the words of Learner Lab founder Trevor Ragan) of writing/blogging/putting ideas out in the open. I’m starting neck deep in the “Feeling Weird Swamp” portion of this path, but I’ve been hiding behind fear and inaction for too long.

With that said, let’s get into this first post!

#savestanfordmvb

In the midst of all the unfortunate happenings in and out of the volleyball world, Stanford’s athletics department announcing they will be dropping their Men’s Volleyball program after the 2020-21 season is certainly a heavy blow for us in the sport. Going to watch high level volleyball matches at Stanford (both men’s and women’s) further stoked my attachment to this sport during high school, which has led me to where I am today. Today’s post will take a look at Stanford Men’s Volleyball Program’s contribution to the USA Men’s National Team (MNT) roster since 2006 (when the USA men began regularly competing in the FIVB World League).

The Data

I pulled data from the USA Volleyball website which includes historical rosters back to 2003 for the MNT. I took travel rosters for FIVB senior level events: World League, Volleyball Nations League, Grand Champions Cup, World Championships, World Cup, and the Olympic Games, including alternates. I did this manually by copying and pasting each roster then cleaning the data up within Excel (I still have a lot to learn about web scraping!).

Once I have the data in a tidy format (each column is a variable, each row is an observation), I read the data into R using the readr package from tidyverse.

# load packages for reading (readr), wrangling (dplyr, tidyr, purrr), and visualizing (ggplot2)
library(tidyverse)

# read data
data0 <- read_csv("./mnthistoricalroster.csv")
head(data0)
## # A tibble: 6 x 7
##   name           position city           state college          year competition
##   <chr>          <chr>    <chr>          <chr> <chr>           <dbl> <chr>      
## 1 Matthew Ander~ Opp      West Seneca    NY    Penn State       2019 World Cup  
## 2 Aaron Russell  OH       Ellicott City  MD    Penn State       2019 World Cup  
## 3 Jeff Jendryk   MB       Wheaton        IL    Loyola of Chic~  2019 World Cup  
## 4 Mitch Stahl    MB       Chambersburg   PA    UCLA             2019 World Cup  
## 5 T.J. DeFalco   OH       Huntington Be~ CA    Long Beach Sta~  2019 World Cup  
## 6 Michael Saeta  SS       South Pasadena CA    UC Irvine        2019 World Cup

I want to show which colleges were represented by our MNT athletes each year, regardless of which competition they played in, so I summarize the data by player, college, and year .

player <- data0 %>%
  group_by(name,college,year) %>%
  summarise(.groups = "drop")
head(player)
## # A tibble: 6 x 3
##   name          college     year
##   <chr>         <chr>      <dbl>
## 1 Aaron Russell Penn State  2015
## 2 Aaron Russell Penn State  2016
## 3 Aaron Russell Penn State  2017
## 4 Aaron Russell Penn State  2018
## 5 Aaron Russell Penn State  2019
## 6 Alfee Reft    Hawai'i     2007

Each row of player represents each year each player was named to a senior level FIVB competition travel roster. But what I’m really looking for is how many athletes from each college makes up the MNT roster for the year. So we summarize the data further, grouping by college and year.

college <- player %>%
  group_by(college,year) %>%
  summarise(n = n(),
            .groups = "drop")

Each row of college represents how many (n) athletes from each college made at least one FIVB competition travel roster that year.

The Plot

Let’s start building a plot for the data. I want the data to tell the story of how each college is represented through their athletes on the MNT each year. I’ll put year on the x-axis, and college on the y-axis. I’ll use geom_point to show when a college has athletes on the MNT for each year, and use the size of each point to represent how many athletes were on the roster that year.

ggplot(college,
       aes(x = year,
           y = college,
           size = n)) +
  geom_point()

Not bad. I’m not crazy about listing the colleges in alphabetical order and I’d like to see each year flushed out. I’ll rerun the college data frame to include the average number of athletes over the course of this time window. I’ll plot year as a factor as well to get each year to show up on the x-axis.

college <- player %>%
  
  # get number of years
  mutate(n_years = length(unique(year))) %>%
  
  # add total athlete representation by college over number of years
  group_by(college) %>%
  mutate(avg = n()/n_years) %>%
  group_by(college,year,avg) %>%
  summarise(n = n(),
            .groups = "drop")

ggplot(college,
       aes(x = factor(year),
           y = fct_reorder(college,avg),
           size = n)) +
  geom_point()

Now college is arranged (using fct_reorder from the purrr package) by avg and we get a good view of which colleges are represented the most on the MNT. Hi, Stanford. I’ll put some final touches on this plot to highlight how impactful Stanford Men’s Volleyball has been to the MNT roster and its success.

I’ll add another variable to the college data frame to draw attention to Stanford’s position on the plot, move the legend below to create more horizontal real estate for the plot, and add some final touch ups to the aesthetics of the plot.

college <- player %>%
  mutate(n_years = length(unique(year))) %>%
  group_by(college) %>%
  mutate(avg = n()/n_years) %>%
  group_by(college,year,avg) %>%
  summarise(n = n(),
            .groups = "drop") %>%
  mutate(col = ifelse(college == "Stanford","Stanford","other"))

ggplot(college,
       aes(y = fct_reorder(college,avg),
           x = factor(year),
           size = n)) +
  geom_point(aes(color = col)) +
  theme_bw() +
  scale_color_manual(values = c("other" = "#b6b1a9",
                                "Stanford" = "#8c1515")) +
  theme(legend.position = "bottom",
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.y = element_text(size = 12),
        axis.text.x = element_text(size = 12),
        panel.grid = element_blank(),
        plot.title = element_text(size = 16, face = "bold"),
        plot.subtitle = element_text(size = 14,)) +
  scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
  guides(color = FALSE) +
  labs(size = "Number of Men's National Team Athletes",
       title = "USA Volleyball Men's Indoor National Team\nRoster by College",
       subtitle = "Which colleges are USA MNT athletes coming from?",
       caption = "Created by Nate Ngo @natengo1")

And that’s it! Hopefully this provides another piece of evidence for the positive impact Stanford Men’s Volleyball has had on our sport at a local, national, and global level.

To support the Stanford Men’s Volleyball program, check out the following links:

Twitter

Change.org Petition

On Service Errors

February 10, 2023
ggplot2 tidyverse serve errors lm linear regression parsnip

NCAA Men's Volleyball Rosters

November 13, 2020
rbokeh tidyverse shiny NCAA rosters rvest tidygeocoder

Expected Kills

August 28, 2020
ggplot2 tidyverse tidymodels xK expected kills
comments powered by Disqus