Expected Kills (xK)
Over the course of this quarantine period, I have been trying to learn (relearn?) more about statistical models, machine learning, how other sports are utilizing these concepts, and how to apply that to volleyball. Well, I finally had a breakthrough after weeks of going through Tidymodels examples found on Julia Silge’s site, all the #TidyTuesday posts on Twitter, conversations with Steve Aronson and Chad Gordon, video tutorials by David Robinson and Andrew Couch, and this blog post by Ismael Gomez about fitting an Expected Goals (xG) model (for football/soccer). The latter has been one of my major motivating factors in this learning process, in hopes to adapt the concept to our sport. With that said, let’s take a look at this process of building a model, and some initial looks at how to use it.
The Data
To illustrate this example, we’ll take a look at an Olympic Qualifier match between Netherlands and Belgium from 2019, using the Data Volley file (.dvw) to pull event level data from, and is available on my GitHub. The data used to fit the model is not publicly available as some of it is pulled from VolleyMetrics, and edited with the help of David Dantes, Jon Parry, Brian Failinger, Maggie Eppright, and Andrew Strick.
# load necessary packages
library(tidyverse) # data manipulation and plotting
library(datavolley) # working with .dvw files from Data Volley
library(intrval) # use of `%ni` function, opposite of %in%
library(tidymodels) # building statistical models
# load data for model
data0 <- readRDS("~/data0_xK.RDS")
# this is where you'll need your own data for fitting the model
# I'll be working with Data Volley Files and using Ben Raymond's {datavolley} package
# to load those files into a data frame to work with
Now, data0
is a data frame containing a large sample of matches I will be using to train and test the model, and data_pred
is a data frame containing the event data from the Netherlands-Belgium match mentioned before.
The Model
In general, the Expected Goals (xG) model looks to quantify the quality of a shot taken in football, often measured as the probability a shot becomes a goal. Most xG models that I’ve come across include x-y coordinate data of where the shot was taken from and visible shot angle at a minimum, while many others include other features such as time of play, left foot/right foot/head/body, shot from counterattack, etc. dependent upon the data available.
The following model will attempt to do something similar for volleyball - namely quantifying the probability that an attack results in a kill (hence, Expected Kills or xK). The model will use the following factors to predict the probability of a kill:
- Set location (x-y coordinate)
- Attack Type
- Middle Quick Attacks
- All other In System Pin Attacks
- High ball Out of System Attacks
- Rally Phase
- Attack After Reception (or First Ball Side Out/FBSO)
- Attack After Dig Transition
- Attack After Free Ball Transition
- Quick Attack Availability
- Middle Quick Attack is a Viable Option
- Middle Quick Attack is not a viable option
To keep things simple, I will limit the prediction to Kill/No Kill and not be concerned with predicting errors. My main interest here is assessing the quality of the situation prior to the attack attempt.
First, let’s tidy up the data we’ll use to fit the model. I personally am more comfortable with the coordinate system Data Volley uses, and so I will convert start_coordinate_x
and start_coordinate_y
into the integer values that Data Volley provides and store those into x1
and y1
, respectively.
Next, we want to link the set location coordinates to the attack, and assuming the set code directly precedes the attack, use the shift
function from {binhf
} to store a version of the coordinate columns, but shifted one row back (in the case of the shift
function, right
) in prev1x1
and prev1y1
.
We’ll create two more columns in a similar manner so that we can filter for attacks that succeed a set using prev1skill
, and determine the type of first contact that precedes the set with prev2skill
.
prev1set_code
will help us determine whether the Middle was a viable attack option or not based on the Setter Call code from Data Volley.
# create columns to identify features to be used in the model
data0 <- data0 %>%
dplyr::mutate(x1 = round(26.66666*start_coordinate_x - 2.83333333,0),
y1 = round(13.5*start_coordinate_y + 2.75,0),
prev1x1 = binhf::shift(x1,1,"right"),
prev1y1 = binhf::shift(y1,1,"right"),
prev1skill = binhf::shift(skill,1,"right"),
prev2skill = binhf::shift(skill,2,"right"),
prev1set_code = binhf::shift(set_code,1,"right"))
Now, we’ll clean up the variable names and values, turn the non-numeric variables into factors, and filter the data down into a data frame with just the variables we want for fitting the model.
attack <- data0 %>%
dplyr::filter(sk == 4 & !is.na(prev1x1) & !is.na(prev1y1) & # attacks with set locations
skill_type != "Other attack" & # remove over pass attacks
prev1skill == "Set" &
prev2skill %in% c("Reception","Dig","Freeball")) %>%
dplyr::mutate(result = ifelse(evaluation_code == "#","kill","other"),
result = factor(result),
attacktype = case_when(skill_type == "High ball attack" ~ "OS",
skill_type == "Quick ball attack" ~ "Q",
TRUE ~ "IS"),
attacktype = factor(attacktype),
attack_subtype = case_when(skill_subtype == "Hard spike" ~ "H",
skill_subtype == "Soft spike/topspin" ~ "P",
skill_subtype == "Tip" ~ "T",
TRUE ~ "T"),
attack_subtype = factor(attack_subtype),
phase = case_when(prev2skill == "Reception" ~ "Reception",
prev2skill == "Dig" ~ "DigTransition",
prev2skill == "Freeball" ~ "FreeBall",
TRUE ~ "other"),
phase = factor(phase),
q_available = case_when(prev1set_code %in% c("KF","KG","KP","KB","KS","KC") ~ "Yes",
prev1set_code == "KO" ~ "No",
is.na(prev1set_code) ~ "No", # ^ my codes for Setter Calls..
TRUE ~ "No"), # change these out for your own
q_available = factor(q_available)) %>%
dplyr::select(result,prev1x1,prev1y1,attacktype,q_available,phase)
head(attack)
## result prev1x1 prev1y1 attacktype q_available phase
## 1 kill 56 48 IS Yes Reception
## 2 other 52 47 IS Yes Reception
## 3 kill 55 44 Q Yes DigTransition
## 4 kill 40 35 IS Yes Reception
## 5 other 67 38 IS Yes Reception
## 6 other 72 38 OS No DigTransition
Our data is prepped! Time to build a model! For the following section, I followed along Julia Silge’s post on getting started with Tidymodels to build a Random Forest model. In her post, Julia Silge builds both a Logistic Regression model and a Random Forest model. I chose to focus solely on the Random Forest model as there are less restrictions on the distribution of our data. Check out StatQuest by Josh Starmer for a great resource for learning more about statistical concepts and modelling.
First, we’ll split our data set into a training set and a testing set. We’ll use the {rsample
} (loaded from {tidymodels
}) to achieve this.
# set a seed for R's randon number generator for reproducibility
set.seed(123)
# split the data - initial_split defaults to 3/4 to training data, 1/4 to testing data
attack_split <- rsample::initial_split(attack, strata = result)
attack_train <- rsample::training(attack_split)
attack_test <- rsample::testing(attack_split)
Next we’ll create bootstrap samples of the training data using the bootstraps
function also from the {resample
} package. For a deeper dive into bootstrapping, check out this StatQuest Video. In essence, bootstrapping will help us evaluate the model we will fit.
attack_boot <- rsample::bootstraps(attack_train)
Now we specify the (random forest) model using the functions below from the {parsnip
} package.
rf_spec <- parsnip::rand_forest() %>% # specify random forest model
parsnip::set_mode("classification") %>% # classify observations as kill/no kill
parsnip::set_engine("ranger") # use {ranger} package to fit model
rf_spec
## Random Forest Model Specification (classification)
##
## Computational engine: ranger
We’ll construct a tidymodels workflow
to put the pieces of the model together.
attack_wf <- workflows::workflow() %>% # initiate workflow
workflows::add_formula(result ~ .) # specify model formula - we are calling the
# model to predict `result` by all other variables
attack_wf # in the data we provide (`attack` data frame here)
## == Workflow ========================================================================================================
## Preprocessor: Formula
## Model: None
##
## -- Preprocessor ----------------------------------------------------------------------------------------------------
## result ~ .
We are finally ready to fit our model to the training dataset! We’ll pipe the model we specified in the rf_spec
object into the attack_wf
workflow object above, then use the fit_resamples
function from the {tune
} package to fit the model to each of the bootstrapped resamples we created in attack_boot
.
rf_rs <- attack_wf %>%
workflows::add_model(rf_spec) %>%
tune::fit_resamples(
resamples = attack_boot,
control = tune::control_resamples(save_pred = TRUE) # save predictions from each
)
Our random forest model is fit to each of our resamples of our training dataset! Now let’s evaluate how well the models performed.
tune::collect_metrics(rf_rs)
## # A tibble: 2 x 5
## .metric .estimator mean n std_err
## <chr> <chr> <dbl> <int> <dbl>
## 1 accuracy binary 0.589 25 0.000369
## 2 roc_auc binary 0.618 25 0.000412
The average accuracy of the 25 models was 58.9% for correctly predicting whether an attack from the attack_train
dataset was a kill or not. The roc_auc
metric is used to compare model accuracy as well. More information on this metric can again be found on StatQuest. We can also take a look at the confusion matrix to see the average number of prediction results across the resamples.
rf_rs %>%
tune::conf_mat_resampled()
## # A tibble: 4 x 3
## Prediction Truth Freq
## <fct> <fct> <dbl>
## 1 kill kill 10434.
## 2 kill other 7957.
## 3 other kill 4111.
## 4 other other 6863
All in all, our model is not the most accurate, but provides a good starting point for building an Expected Kills model.
Let’s fit the model again, this time using the training dataset, attack_train
, to estimate the performance of the model on new data.
attack_final <- attack_wf %>%
workflows::add_model(rf_spec) %>%
tune::last_fit(attack_split) # fits the final best model of the training set
# and evaluate the test set
Results of the final model:
collect_metrics(attack_final)
## # A tibble: 2 x 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.591
## 2 roc_auc binary 0.618
collect_predictions(attack_final) %>%
conf_mat(result, .pred_class)
## Truth
## Prediction kill other
## kill 9621 7312
## other 3552 6097
Pretty consistent with the models we fit on the training dataset.
We’ll fit the model one last time, but now using the entirety of the data from attack
, which we will then use to add predictions and probabilities to new match data.
attack_model <- fit(attack_wf %>% add_model(rf_spec),
data = attack)
Expected Kills Use Case
Now that we did all that work fitting the model, it’s time to actually use it! We’ll load in the Netherlands vs. Belgium Olympic Qualifier match from 2019 and show off a couple examples of how our model can be used.
# load data for prediction
data_pred <- datavolley::read_dv("./&19 ioq b03 ned v bel.dvw",
insert_technical_timeouts = F) %>%
purrr::pluck(.,"plays") # extract the `plays` data frame from the list of objects in data_pred
Let’s prep the data like we did as we were building the model above.
data_pred <- data_pred %>%
dplyr::mutate(x1 = round(26.66666*start_coordinate_x - 2.83333333,0),
y1 = round(13.5*start_coordinate_y + 2.75,0),
prev1x1 = binhf::shift(x1,1,"right"),
prev1y1 = binhf::shift(y1,1,"right"),
prev1skill = binhf::shift(skill,1,"right"),
prev2skill = binhf::shift(skill,2,"right"),
prev1set_code = binhf::shift(set_code,1,"right"))
Then we’ll filter out all attacks that have a set preceding them.
data_attack <- data_pred %>%
dplyr::filter(skill == "Attack" & !is.na(prev1x1) & !is.na(prev1y1) &
skill_type != "Other attack" &
prev1skill == "Set" &
prev2skill %in% c("Reception","Dig","Freeball")) %>%
dplyr::mutate(result = ifelse(evaluation_code == "#","kill","other"),
result = factor(result),
attacktype = case_when(skill_type == "High ball attack" ~ "OS",
skill_type == "Quick ball attack" ~ "Q",
TRUE ~ "IS"),
attacktype = factor(attacktype),
phase = case_when(prev2skill == "Reception" ~ "Reception",
prev2skill == "Dig" ~ "DigTransition",
prev2skill == "Freeball" ~ "FreeBall",
TRUE ~ "other"),
phase = factor(phase),
q_available = case_when(prev1set_code %in% c("KF","KG","KP","KB","KS","KC") ~ "Yes",
is.na(prev1set_code) ~ "No",
TRUE ~ "No"),
q_available = factor(q_available))
When we were building the model, we selected only the relevant columns that we wanted to feed into the model. Here, we are leaving all variables as they are, and will append the predictions and probabilities from the model.
attack_pred <- stats::predict(attack_model,data_attack) # predict kill/no kill with model
attack_prob <- stats::predict(attack_model,data_attack,type = "prob") # probability of kill with model
data_attack <- bind_cols(data_attack, attack_pred, attack_prob) %>%
dplyr::mutate(pred_correct = ifelse(result == .pred_class,"Correct","Incorrect"))
I added another variable pred_correct
to show when the model predicted the result correctly or not.
Here’s where the rubber meets the road. Let’s sum up all the probabilties for each attacker, what we’ll call their Expected Kills (xK), and compare that to the actual number of kills they had. Again, we’ll only consider attacks that were preceded by a set (i.e. no overpass kills) and that have a predicted value.
xKill <- data_attack %>%
dplyr::group_by(player_number,player_name,team_id) %>%
dplyr::summarise(xK = sum(.pred_kill),
Kills = sum(skill == "Attack" & evaluation_code == "#"),
n = n(),
.groups = "drop") %>%
dplyr::arrange(team_id,desc(n))
player_number | player_name | team_id | xK | Kills | n |
---|---|---|---|---|---|
3 | Sam Deroo | BEL | 8.6343667 | 8 | 19 |
17 | Tomas Rousseaux | BEL | 7.5160499 | 8 | 16 |
2 | Hendrik Tuerlinckx | BEL | 6.0866751 | 7 | 13 |
1 | Bram Van Den Dries | BEL | 3.5391425 | 2 | 8 |
9 | Pieter Verhees | BEL | 2.9228624 | 3 | 5 |
10 | Simon Van De Voorde | BEL | 1.8086275 | 1 | 3 |
18 | Seppe Baetens | BEL | 1.5674565 | 1 | 3 |
4 | Stijn D’Hulst | BEL | 0.5704203 | 0 | 1 |
8 | Arno Van De Velde | BEL | 0.5838593 | 1 | 1 |
14 | Nimir Abdel-Aziz | NED | 12.5802707 | 19 | 28 |
4 | Thijs Ter Horst | NED | 9.7495112 | 7 | 20 |
3 | Maarten Van Garderen | NED | 6.9730102 | 11 | 16 |
8 | Fabian Plak | NED | 3.8207249 | 3 | 7 |
5 | Luuc Van Der Ent | NED | 1.8366475 | 3 | 3 |
16 | Wouter Ter Maat | NED | 1.6274755 | 2 | 3 |
7 | Gijs Jorna | NED | 0.3541823 | 1 | 1 |
It looks like Belgium’s attackers’ xK and actual kills in the match were pretty similar, whereas Netherlands had two attackers that well outperformed their xK in Nimir Abdel-Aziz and Maarten Van Garderen. Let’s take a deeper look at these two attackers’ attempts, as well as Thijs Ter Horst, who seemed to have underperformed in this match, relative to his xK.
First we’ll look at Nimir Abdel-Aziz. Let’s partition his attacks by whether or not the model’s prediction was correct or not, and also by the type of attack. The Opposite finished the match with 19 kills on 12.6 xK.
This seems to show a lower propensity for the model to predict kills for attacks beyond the attack line, particularly Out of System attacks. As such, Abdel-Aziz outperforms his xK value likely due to the high rate of kills he put away in Out of System situations (a really good performance on long way high balls to A2).
Next we’ll look at Maarten Van Garderen. The Outside Hitter had 11 kills on 6.97 xK for the match.
We see the same trends with attempts inside the attack line and beyond it as far as the model prediction goes. Scoring on attempts that he “should have” scored on and a good number of kills on high balls to z4, it was a good performance for the Outside Hitter against Belgium.
Finally, we’ll look at Thijs Ter Horst. The Outside Hitter finished the match with 7 kills on 9.75 xK.
The model shows missed opportunities on the Go, the Bic, and no kills Out of System for Ter Horst in this match.
For a first look, this seems to pass the eye test in this case study. The model confirms the obvious - In System attacks, especially from inside of 3 meters, should be kills. But beyond the binary classification of kill/no kill, the ability to assign kill probabilities for each individual attempt is really where a model like this shines, in my opinion. There’s certainly plenty more to learn and explore, but I have to admit that I’m reeeally fired up to have a working Expected Kills model!