The Extra Point

2020 Big Data Bowl Recap

The second annual Big Data Bowl, powered by Amazon Web Services (AWS), focused on predicting the outcomes of rushing plays during the 2019 season. Participants were provided with the NFL's Next Gen Stats, including speed, direction, and location information for all 22 players on the field at the moment a ball carrier receives the ball, and were tasked with predicting where the ball carrier would end up.

This year, six collegiate finalists presented their work to NFL club analytics staff at the NFL Combine in Indianapolis. Three honorable mention papers were also named. Below is a summary of each presentation and a link to the complete entry.

Grand Finalist

Matt Ploenzke (Harvard) 

Matt Ploenzke (AJ Mast/AP Images for NFL)

Matt Ploenzke (AJ Mast/AP Images for NFL)

Ploenzke used Next Gen Stats data to build interpretable model inputs based upon football-specific domain knowledge, ultimately highlighting the importance of ball carrier downfield acceleration and unblocked tackler distance and spacing. 

Key stat: Among roughly 40 input variables, a ball carrier’s “effective acceleration” was the most important for estimating yards gained on a handoff play.

Download Matt Ploenzke’s paper.


Kellin Rumsey, Brandon DeFlon (University of New Mexico)

The battle between blocker and defender is often decided by leverage. In this paper, Rumsey and DeFlon define offensive and defensive leverage, and study the statistical properties of these metrics. 

Key stat: In the first six weeks of the 2017 season, Blake Martinez (Green Bay Packers) was among the league’s best at generating defensive leverage. Martinez finished the season with the third-most solo tackles (96).

Download Kellin Rumsey and Brandon DeFlon’s paper.

Graham Pash, Walker Powell (NC State)

Pash and Powell used kinematic data such as player positions and velocity to determine zones of control for both the offensive and defensive teams at the time of the handoff. These zones of control predict the probabilities of yards lost or gained and quantifies the risk involved with plays.

Key stat: Robert Woods (Los Angeles Rams) and Raheem Mostert (San Francisco 49ers) outperformed the model predictions the most, averaging nearly three more yards than predicted over the 2017 and 2018 seasons.

Download Graham Pash and Walker Powell’s paper.

Namrata Ray, Jugal Marfatia (Washington State University)

Ray and Marfatia measured the open space of the rusher at three time intervals — handoff, after a half-second, and after one second — to understand the association between open space and yards gained. Results indicated that the difference in the open space between the time of handoff and after a half-second or full second was a strong predictor of the number of yards gained. 

Key Stat: Yards gained by the rusher increases by four yards on average for every one percent increase in the additional open area created within a half-second of the handoff.

Download Namrata Ray and Jugal Marfatia’s paper.

Alex Stern (University of Virginia)

Using an advanced machine learning algorithm, Stern assessed the value of initial space created for the ball carrier by the offensive line. That space was then linked to linemen grades, and standardized by accounting for the number of defensive backs, linebackers and defensive linemen on the play, the defensive strength of the opposing team, and the running direction of the running back. 

Key Stat: In 2018, New Orleans Saints center Max Unger received a top five grade according to Stern’s space grade rank for centers, despite being the 31st-graded run blocker by Pro Football Focus.

Download Alex Stern’s paper.

Caio Brighenti (Colgate University)

Brighenti computed each team's control of the field at the moment of the handoff to predict the outcome of rushes. Brighenti found that offensive control at the running back's expected point of intersection with the line of scrimmage was the most important predictor of run yardage.

Key Stat: The critical factor separating successful and unsuccessful plays is ownership of the run gap at the line of scrimmage — even on plays gaining more than 10 yards, the difference in field control past the line of scrimmage was almost negligible.

Download Caio Brighenti ’s paper.

Honorable mention papers

Bryant Davis (University of Florida)
Predicting Rushing Yards Using a Convolutional Autoencoder for Space Ownership

Aaron Kruchten (Carnegie Mellon) 
Estimating the Causal Effect of Defensive Formation on Yards Gained in Run Plays

Lucas Wu, Dani Chu, Matthew Reyers (Simon Fraser)
Breaking Through the Line: Evaluating Running Back Contributions to Running Plays