CricketML

Statistics and Machine Learning in Cricket

About

We are researchers working on problems in Cricket:

  • using Statistical and Machine Learning lenses
  • to further the applications of sophisticated Data/Sports Analytics
  • to contribute to academic literature on Sports Analytics
  • to propose novel and practical solutions to (real) problems in the sport

Sample Problems

While our interest spans a wide array of interesting problems in Cricket, we list two representative problems that we have worked on recently.

Score Trajectory Forecasting

Problem Definition

Given a partially observed innings, forecast the entire future trajectory of the innings.

Data

We have available historical ball-by-ball data from thousands of international cricket games.

Algorithm

We propose a novel algorithm which decomposes the partially observed innings of interest in to several innings across history. Forecasts are made based on the decomposition estimated.

Some Visual Results

Scroll through the plots below to see the forecasts produced by our algorithm in comparison with reality. For reference: the black line is the actual innings (cumulative scores), the red line is the forecast and the verticle green line is the last point in the innings that the forecast algorithm is allowed to observe. Everything to the right of the green line is a prediction with no additional knowledge of the innings.



Next, we show the MAPE (Maximum Absolute Percentage Error) distributions for forecasts for 750 most recent 50-over LOI innings. For each innings under consideration, we end the training at the 30th over (180 balls) mark. Our algorithm then forecasts the remainder of the innings (all 20 overs). The figure below shows the MAPE error distributions for different forecast horizons, i.e. at the 35th, 40th, 45th and 50th over marks. We expect the errors to be smaller for the shorter horizons. Notice that the median errors for the longest forecast horizon is only about 5%.

mape

Finally, for the same set of innings as above, we also produce the R^2 values for the forecasts at the 35th, 40th, 45th and 50th over points. The table below shows the R^2 values which suggest that our algorithm is able to capture much of the variation in the data, including for a long (20 over) forecast horizon.

Forecast evaluation at: 35th over 40th over 45th over 50th over
R^2: 0.90 0.81 0.75 0.73

More Details

Please check back soon for more specific details of the algorithm and the underlying theory. We will also make available a web-app which shows forecasts for thousands of innings in the past. Stay tuned!

Target Revision à la DLS

Problem Definition

Assume that a team is set a target to chase in their maximum allocated overs, e.g. 50 overs (300 balls). At some point during the chase, there is an intervention e.g. rain, and the innings duration must be shortened. For the new revised shortened duration, produce a par score if the innings never resumes (for the loss of the same number of wickets as in the actual chase) and a revised target (10-wicket equivalent) if the innings resumes but for a shortened maximum duration.

Is DLS biased?

We note that in Limited Overs International (LOI) cricket games, the International Cricket Council (ICC) uses the Duckworth-Lewis-Stern (DLS) method to solve the target revision problem described above. We study the application of the DLS method in 50-over LOI's over a period of about 15 years (2003 - 2017, both inclusive) across 1953 LOI games. The table below shows that the use of DLS appears to introduce a statistically significant bias in favor of the team chasing the revised target. In statistical terms, using a Chi-Squared test of independence, we find that the we can reject the null-hypothesis, i.e. the distribution of games won by teams batting first vs second remain the same with or without the use of the DLS method, at the 95% confidence level (p-value = 0.048).

Won by Team Batting First Won by Team Batting Second Total
No DLS 865 (48.5%) 916 (51.5%) 1781
DLS 70 (40.6%) 102 (59.3%) 172
Total 935 1018 1953

We note that the authors of the DLS method argue that their method is fair because it appears to leave the distribution of games won by teams batting first vs second, unchanged. However, it appears that in the past several years of the 50-over LOI game, the evidence is stacked against the DLS method.

Our Goal

For us, the research question of interest is to suggest an alternative or an adaptation/enhancement of the DLS method which promises to reduce the bias in the favor of the chasing team. We are also interested in the development of a framewrk which can be used in statistical evaluation of alternate candidate methods. We believe this is a critical step in advocating for more transparency and can also allow researchers to bechmark their mehods in a systematic manner, instead of adhoc and subjective analysis.

Algorithm

We propose a novel algorithm which takes in to account the chasing mindset at each point in a typical chase. We learn from the manner in which teams typically make the target number of runs in their maximum allocated overs to estimate an average path to victory that can be prescribed for any team chasing a target. Using this average path to victory and some non-linear transformations learned from data (similar to the DLS resources-remaining table), we can produce par-scores and 10-wicket revised targets for the chasing team.

Some Visual Results

Scroll through the plots below to see the average path to victory for some LOI games. The lines in red represent the actual innings, the dotted lines in green represent the average path to victory and the dotted-dashed gray lines show the par score for the equivalent number of wickets lost (as in the actual innings). Note that we do not show the wickets lost for visual simplicity.

We also show the profile of revised 10-wicket targets for the ICC World Cup Final 2011 between India and Sri Lanka. We show the revised targets produced by the DLS method and our algorithm. Notice the bias correction that our algorithm achieves by producing targets that are generally larger than the DLS algorithm.



More Details

Please check back soon for details of our proposed algorithm(s), rigorous statistical results and the underlying theory. Stay tuned for more details coming soon!

Talks and Publications

Below, we highlight some of our recent work. We intend to periodically update this section to share our latest work. If your work is relevant to this domain, please let us know and we will include it in the Related Works section.

Publications

Talks

Related Works

Contact us

Your comments, suggestions and reviews are welcome and much appreciated. Please get in touch with us at cricketml@mit.edu