Phil Oliver (L) and Patrick Noone from the CricViz team. 
Phil Oliver (L) and Patrick Noone from the CricViz team.  
Sports

Swarajya Speaks To CricViz, The Team Behind The Data Intelligence Models For Cricket World Cup 2019

ByTushar Gupta

Victory,  defeat, predictions, performances — and data — the Cricket World Cup 2019 was one of the best championships in recent times.

Phil Oliver and Patrick Noone are the men behind the cricket intelligence that enhanced viewing experience with elaborate data insights. Now, they are promising to take it to the next level.

The viewing experience of the Cricket World Cup 2019 was enhanced by elaborate data insights. From victory and defeat predictors at varying stages of the game to player performances, the viewers had a lot of data to play around with while enjoying one of the best world cups in recent times.

The data intelligence portal was accessible through its main website and apps for Android and iOS platforms.

Swarajya spoke to CricViz, the team behind these data intelligence models. In conversation with Phil Oliver (managing editor and co-founder) and CricViz analyst Patrick Noone, we dived into the intricacies that drive cricket intelligence in the age of data.

To begin with, tell us something about your organisation and the team, and where did it all start?

CricViz was formed in 2015 with the aim of presenting a new analytical narrative in cricket. We have grown to a team of 12 full-time staff, that includes developers and data scientists who manage the database and tools at the back end, as well as an editorial team who produce content for broadcast and media clients using numbers from the database.

Nathan Leamon, the England team analyst, is a consultant and was responsible for the WinViz model that we still use on a variety of platforms.

You were contracted with the ICC for this World Cup. How has the overall experience been watching and evaluating the biggest cricket tournament of the world this closely?

While we have experience of working on ICC events — this was our fifth, including men’s and women’s — nothing previous has ever been on the scale of this tournament. Freddie Wilde, Ben Jones, Srinivas Vijaykumar and we travelled around the country, working with the four different TV crews and were tasked with delivering statistical insights that would enhance the coverage.

We had plenty of success in getting a lot of our material to air, with some of our new innovations — notably the fielder leaderboard — gaining a lot of traction online and in various sections of the media.

Unlike any of the previous ICC tournaments, the last one being the Champions Trophy of 2017, data intelligence was critical in this series, even for the viewers who got a new perspective on the game. How did you go about your evaluation and what is the size of your data sets?

We have three main datasets that we use regularly: a ball-by-ball database that contains information from every international and the majority of domestic cricket since 1999; Opta data that records more detailed ball-by-ball information, such as shot type, foot movement etc, for every international and most T20 leagues since 2006; ball tracking data that allows us to measure how much the ball is swinging, seaming and spinning for most internationals since 2005.

Our query tool — designed by Travis Basevi, the brain behind Cricinfo’s Statsguru — allows us to map the different datasets onto each other and draw conclusions based on that data. For example, we can use Opta data to see how often a batsman plays a sweep shot and we can then use ball tracking data to see how well he plays it when the ball is turning a large amount. That ability to map the different datasets onto one another is what sets us apart in terms of what we can deliver.

On a sunny day in England, 300+ can be chased down, but in overcast conditions, even a 250 can’t be chased down by the best of teams. Given the complex nature of the game, numbers can be deceiving, given how numbers can be different for different conditions or continents. What role do the pitch, ground conditions and weather have to play in your overall analysis?

Historical venue data is built in to our predictor models, so PredictViz and WinViz have an idea of what a par score is at each ground, based on recent data there.

Additionally, with ball tracking data, we can see how a pitch is behaving — if it’s turning a lot, if there is a lot of seam movement, if it’s bouncing more than normal etc. We can then go back into our database and find examples of pitches where conditions were similar and draw conclusions based on the performance of the players in the current game.

At one point in the England-Australia semi-final, your predictor gave 222 to Australia and they ended up with a final score of 223, the score being a testament to the accuracy of your work. As the World Cup progressed, did the English conditions have a role to play in your data analysis and its increasing accuracy, given we witnessed a very low-scoring tournament?

Yes, I’m sure you could find some examples earlier in the tournament where we overshot the predicted score, as recent history suggested this would be a higher scoring tournament than it turned out to be.

Luckily, we are able to adjust the models to take into account the unexpected conditions, and also factors such as the trend of teams batting first coming out on top, in order to produce more accurate results. All of the models are continually being adjusted and improved to reflect what happens out in the middle.

How do you go about your analysis from a macro and micro point of view? For instance, how critical is a team’s performance or a player’s performance to your overall data model?

The player’s record is the more important variable when generating the predictor models. Each team’s players are the ‘resources’ the team has at their disposal and the simulator works by predicting how they will perform, based on their career numbers, while also taking into account variables such as the venue.

A team’s previous record against a particular opposition is something that isn’t considered; a recent example of this would be the Australia versus India Test series. Even though India had never won a Test series in Australia, that fact was not counted against the current team. While a record such as that could have intangible effects on the players’ frame of mind — weight of pressure etc — it is not something that we work into our models; the emphasis is on the here and now.

Walk us a bit through a sample simulation of an ODI game. Let’s say India is taking on Pakistan with both teams retaining their current squads, and India is 70/3 in 15, playing at the Oval, London, having lost Kohli, Rohit and Dhawan. What all does your data model take into account from here?

India would have taken a heavy hit on their percentage with their three most valuable resources already out of the game. We would therefore likely have seen a big swing to Pakistan, who would not have started the game as favourites.

However, it would not be a lost cause for India just yet, with so many overs left in the match, the quality of their bowling attack and players such as Hardik Pandya still able to contribute with the bat. Another factor to consider would be the allocation of Pakistan’s overs. If, say, Mohammad Amir had bowled eight of the first 15 overs, India’s percentage would be higher than if he had bowled just five, because Pakistan would have used up more of that available resource.

Fielding, for a greater part of cricket’s existence, has merely been an add-on. However, things have changed for the good in recent years as teams are looking at poor fielders as a liability. How does your data model evaluate fielding, and how critical do you think it’s going to be for ODI and T20 cricket going forward?

We have a team who manually record every fielding action in international matches. Events such as run saves and misfields are logged alongside catches and runouts. With catches, a percentage to evaluate the difficulty is recorded — for example Ben Stokes’ catch in the opening match of the World Cup was a 5 per cent chance, while a regulation wicket-keeper’s catch would be a 90 per cent chance.

That data is all fed into an algorithm that produces a number to determine how many runs each player has saved in the field. While there is a subjective element to the collection, there are guidelines as to what constitutes each event and we see this kind of information becoming more and more valuable in the future, as teams look to quantify what has previously been unquantifiable. Similar to the predictor models, the fielding collection is continually evolving and we envisage that it will only become more accurate as we explore ways to record it.

With a data set this big, and a data model this complex, what are the differences in evaluation between a T20, ODI, and a Test match?

For the predictors, the overs remaining variable is something that obviously only applies to limited overs matches so that idea of having ‘resources in hand’ for the bowling team is much less volatile in Test cricket because a guy can bowl all day if the captain wants him to.

Also, the added variable of a drawn match in Tests means there are more outcomes to be weighed up, and the length of time it takes for a Test to be played presents different challenges when trying to predict what will happen over several days.

In terms of our analysis outside of the predictor models, limited overs cricket lends itself more easily to being broken down and taken apart. Things like phase analysis and bowler-batsman match ups are critical in the shorter forms, while in Tests, there is a greater focus on conditions — has the ball started to reverse? Is it turning more today than yesterday? That kind of thing.

What is the biggest challenge when it comes to the implementation of the data model in a T20, ODI, or a Test match?

Probably reacting to unexpected conditions. If the score predictor is working to an assumption that 300 is par, but then the ball starts swinging and seaming around corners from ball one, we need to be able to adapt to that quickly and make sure that the model accurately reflects the state of play.

In that scenario, if the team batting first managed to make 150, they would be massively unfancied under the original parameters when it could actually be a good score on that particular pitch. Being able to recognise when, and by how much, the model needs to be adjusted is a constant challenge for us in that situation.

The English summer is now moving towards the Ashes. What is your team intending to do about it and what are the biggest challenges in a series this long and varying? Are you going to have evaluations and predictions for each session as well?

We intend to have two analysts at each match, one providing content for Sky Sports in the commentary box and one in the press box to assist our media clients and deliver analysis on our own platforms. The predictor models we use on TV will be the same as before and, additionally, it will be our job to offer other areas of insight that help the commentators to tell a story.

One of the biggest challenges in a series of this length is to keep finding new angles that tell interesting stories, while staying across developing narratives in the series as a whole. It’s about finding that balance between something that’s fresh and relevant now and something that has progressed over a number of matches.

Tell us about your work outside the ICC. Have you been working with any cricket boards, T20 leagues or domestic franchises? How has the overall experience and learning been?

Besides having a range of broadcast clients around the world, we provide data services to West Indies cricket, having recently relaunched their website with innovative match centres and player profiles. We are also increasingly active in performance analysis — clients include Melbourne Renegades, who receive opposition analysis, match packs and squad building advice — it was a great thrill to be part of their winning team last season.

Data can go a long way in helping players who are playing domestic leagues, county cricket in England or Ranji cricket in India. Are you, for the future, planning to implement your data intelligence model for domestic cricket as well?

It’s unlikely that you’ll see WinViz being used regularly in domestic first-class cricket, but it is already used for domestic T20 leagues around the world, most notably the PSL where we have had an onsite presence with the TV crews for each of the last two tournaments.

Another of our other models, match Impact — a tool that we use to assess a player’s positive or negative contribution to their team’s chances of winning the match — is something that we already use a lot but we expect will become even more prominent as T20 franchises look to recruit players with limited datasets.

A word on the Indian cricket team, and what does your data model say about their consistent failures in the past few years in the knockout games of the ICC tournaments?

This is a good example of where intangible and harder to measure variables have an impact. It is hard to make too much of a pattern from a small sample size that includes Fakhar Zaman’s stunning Champions Trophy innings and the conditions at Old Trafford in the recent World Cup semi-final.

What are your plans for the future, as an organisation, and how evolved would be your data model by the 2023 World Cup in India as compared to the one you used in 2019.

In the immediate future, we have developed a new model called PitchViz that tries to determine how difficult it is to bat at any given point of the match. We are able to calculate a number using ball tracking data that includes the amount of seam, swing, spin, bounce, consistency of bounce and pitch pace. If PitchViz is 10, the pitch is extremely hard to bat on, if it’s 0 then it’s very easy to bat on. We are able to view this figure over time, to see how a pitch’s characteristics have changed throughout the course of a match.

We expect that the models we have will keep developing and we are already looking at an updated version of WinViz that will incorporate factors such as player rankings.

More long term, we hope to continue growing, and expect to be more visible across broadcast and media outlets as we establish ourselves as the premier provider of cricket analytics.

Patrick Noone (L) and Phil Oliver at the final of the 2019 Cricket World Cup at Lord’s London.