RcppRoll package for swift moving averages

Bob Rudis always keeps his pulse on the best packages and recently bemoaned the fact that Kevin Ushey’s RcppRoll package which

Provides Routines for the efficient computation of windowed mean, median,
sum, product, minimum, maximum, standard deviation and variance are provided.

has not had enough praise

One reason is probably the lack of examples available - even in the vignette

I have therefore done a swift illustration using just one of the functions, roll_sum()

Best Runs in the Premier league

Tottenham Hotspur recently went on a tear of nine consecutive victories, ended recently by West Ham and thereby pretty well scuppering their remote chances of the league title

Let’s do a quick check to see if the past 10 games remains their most productive in terms of points gained (which, for the uninitated is 3 for a win, 1 for a draw/tie and zero for a loss) since 1992, when the Premier League was formed

N.B Initial analysis based on data to 4th May 2017 but underlying data is updated periodically

Here are libraries used

library(RcppRoll)
library(plotly)
library(crosstalk)
library(purrr)
library(htmltools)
library(tidyverse)

I maintain a ‘standings’ file which contains the raw data required

standings <- readRDS("data/standings.rds")

glimpse(standings)
## Observations: 19,692
## Variables: 20
## $ season        <chr> "2004/05", "2004/05", "2004/05", "2004/05", "200...
## $ final_Pos     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ team          <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Che...
## $ GF            <int> 1, 3, 1, 2, 3, 0, 1, 3, 4, 1, 3, 1, 0, 1, 3, 2, ...
## $ GA            <int> 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, ...
## $ gameDate      <date> 2005-05-15, 2005-05-10, 2005-05-07, 2005-04-30,...
## $ tmGameOrder   <int> 506, 505, 504, 503, 502, 501, 500, 499, 498, 497...
## $ tmYrGameOrder <int> 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, ...
## $ venue         <chr> "A", "A", "H", "A", "H", "H", "H", "A", "H", "H"...
## $ MATCHID       <int> 5285, 5277, 5265, 5253, 5243, 5235, 5217, 5212, ...
## $ OppTeam       <chr> "Newcastle U", "Man. Utd.", "Charlton", "Bolton"...
## $ points        <dbl> 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 1, 3, 3, 3, ...
## $ cumGF         <int> 72, 71, 68, 67, 65, 62, 62, 61, 58, 54, 53, 50, ...
## $ cumGA         <int> 15, 14, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 8, ...
## $ cumPts        <dbl> 95, 94, 91, 88, 85, 82, 81, 80, 77, 74, 71, 68, ...
## $ cumGD         <int> 57, 57, 55, 54, 52, 50, 50, 50, 48, 45, 44, 42, ...
## $ allGames      <dbl> 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, ...
## $ position      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ res           <chr> "Draw", "Win", "Win", "Win", "Win", "Draw", "Dra...
## $ tt            <chr> "<table cellpadding='4' style='line-height:1'><t...

team, points and tmGameOrder are the fields required

We can now select Tottenham’s games, use the roll_sum() function to get a rolling total of points secured in ten consecutive games and plot the results in an infoactive chart

# select team and ensure data is in correct order
tm <- standings %>% 
    filter(team=="Tottenham H") %>% 
    arrange(tmGameOrder)
 
   # construct data.frame of results
   run <- roll_sum(tm$points,n=10)
   df <- data.frame(points=run,seq=1:length(run))
  
   # produce chart
    df %>% 
    plot_ly(x=~seq,y=~points) %>% 
    add_lines()

As usual with a plotly chart, you can zoom and hover. I have not bothered with producing the best aesthetic!

Although this is the first time they have had a sequence of nine consecutive wins, the fall of 2011 included a run of 11 games with 10 wins and a draw


Let’s extend this to all teams - the code only takes a couple of seconds to create a 20,000 row data.frame

# get vector of all 47 teams that have appeared in Premier League history
teams <- unique(standings$team)



get_runs <- function(x) {
    tm <-standings %>% 
    filter(team==x) %>% 
    arrange(tmGameOrder)
 
   # construct data.frame of results
   run <- roll_sum(tm$points,n=10)
   data.frame(points=run,seq=1:length(run),team=x)


}

# apply the above function to all teams
data <-map_df(teams, get_runs)

glimpse(data)
## Observations: 19,251
## Variables: 3
## $ points <dbl> 13, 12, 15, 17, 16, 16, 16, 19, 22, 20, 20, 21, 19, 16,...
## $ seq    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
## $ team   <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Chelsea", ...

We can now use this data in a couple of ways. Lets use the filter functions from the crosstalk package to select any team. I covered this in a bit more depth here

sd <- SharedData$new(data)

fs <- filter_select(
id = "team1",
label = "Select Team",
sharedData = sd,
group =  ~ team,
allLevels = FALSE,
multiple = FALSE
)


## this is needed as crosstalk does not work nicely with bootstrap, apparently
fs_nobootstrap <- fs

attr(fs_nobootstrap, "html_dependencies") <- Filter(
  function(dep) {dep$name != "bootstrap"},
  attr(fs_nobootstrap, "html_dependencies")
)



  
   # produce chart
myChart  <-  sd %>% 
    plot_ly(x=~seq,y=~points) %>% 
    add_lines()


  tagList(
  fs_nobootstrap,
  br(),
   myChart
)