What were the chances of that happening?[Authors note: From here on in, I’m going into more methodological detail than I normally would. I’m aware that this won’t be of interest to everyone but I want to be clear about where the numbers in other posts come from. I think transparency is important in prediction, and want people to know what method I’ve employed when they’re interpreting any predictions I make. Budding analysts might find the approach useful in their development, and equally readers might well share superior approaches]. If we look at the differences between our match predictions and actual points differences for all matches, not just the upsets, we start to do some pretty cool stuff. And by “cool” I mean of course “deeply uncool to most people, but potentially interesting to some.” Plotting all of these prediction errors in a histogram (below), we can see that the data is more or less normally distributed around zero (which means there’s not a systematic bias in the model – this is good), and has a certain amount of spread. This is our prediction uncertainty. I’ve used the same colour coding as before – all results with a negative prediction error represent where the higher ranked team performs less well than expected. In some cases (but not all) this results in an upset. If we design a population of random numbers such that it replicates the characteristics of this data set, we can create a data-set of likely outcomes for future matches. All we do here is take numbers from our artificial population (mean = 0, SD = 17), and add it to our points difference predicted by our trend line for that match up. Repeating this process many times for a match, we get a realistic distribution of possible match results around our predicted score-line. The proportion of predicted wins and losses for each team in a match will then vary according to this distribution shape and the distance of its mid-point from zero. We can calculate the chance of victory for either side by simply counting all the occurrences either side of zero, and divided those numbers by the number of iterations. For reference, I used 1,000,000 iterations (below that my results weren’t that repeatable from run to run). Applying this approach to Japan v South Africa (RWC 2015), we get a distribution of possible results around a likely scoreline of South Africa by 29. Approximately 5% of simulated results differences have negative points differences, in other words, giving Japan a 5% (or 1 in 20) chance of victory. While this is definitely heavily weighted in South Africa’s favour, it’s considerably better than the “this is never going to happen” chance I was still giving Japan at half-time.
Prediction TimeWe can, of course, use exactly the same method for prediction. If Japan were to meet South Africa in a re-run of this game, in the quarter finals of the 2019 RWC (which is not that fantastical a proposition), today’s World Rugby ranking points give Japan an improved chance of staging an upset. Comparing the two visualisations, we can see that South Africa’s most likely winning margin has reduced from 29 to 23, but also the distribution has shifted in favour of Japan. This means, a higher proportion of simulated matches result in a Japanese victory (11%), and the chance of an upset (assuming this match were played) has improved from a 20-1 shot, to about 9-1. Now, none of this accounts for Japan’s (presumed) home advantage, nor does it factor in the Bokke being far less likely to underestimate the Brave Blossoms a second time around. What it does mean though, is that the gap between the two nations has narrowed considerably since that memorable day in Brighton. Anyway, this is all very good, and hopefully of interest to some. Lots of people expand it to construct simulations of entire tournaments, using the fixture list and draw to run tens or hundreds of thousands of iterations of the tournament. Adding up the outcome stats gives the relative likelihood of various outcomes (champion, semi-finalist, pool winner, etc). While this is entertaining, for me the power is in its provision of a base-line from which we can explore other effects.
A note of cautionTo round this out I want to highlight the inherent assumptions in my approach, and what that means for interpreting the results.
- That the underlying model accurately represents each competitor nation. We have previously established that the rankings provide a reasonably representative generic model of outcome v relative team strength, with a certain level of variance. However, France have a reputation for being unpredictable. Meanwhile Argentina might reasonably claim the consistent over-performers. If these and other team specific effects are present, they are not factored into this approach, and the predictions are weaker for it.
- That home advantage isn’t a factor. Is it a benefit? Is it a disadvantage? Likely different for any given team. Either way, it isn’t accounted for here. Maybe one to explore further in another post.
- That what happened before predicts what happens next. The data set we have used here is at a minimum four years old. A quarter of the data-set is sixteen years old. In that time, squads, players, coaches and even the laws themselves have changed, many times over. It is entirely possible that we’ve developed a model on a set of data that no longer reflects the nature of the game.
“All models are wrong but some models are useful.” – George E. P. Box in Statistical Control: By Monitoring and Feedback AdjustmentContrary to Pichot’s opinions, when compared to actual World Cup performance, the pre-tournament rankings have proven a reliable barometer of the general world order since they were introduced in 2003. The 2003, 2011 and 2015 tournaments were all won by the team ranked number 1 at the outset (England, New Zealand, New Zealand respectively). Only the 2007 edition differed, with South Africa beginning the tournament ranked 4th. At the individual match level, the rankings are impressively powerful indicators of likely match outcomes. Even without accounting for home advantage, the higher ranked team won 86% of world cup matches, consisting of 165 victories, two draws (Japan v Canada on both occasions; 2007 and 2011) and 25 upsets (13%). Breaking the data down further and looking at individual tournaments, the rankings’ prediction rate is also consistently good.
- 2003 – 89.6% (43/48 matches correctly predicted; 5 upsets)
- 2007 – 75.0% (36/48 matches correctly predicted; 11 upsets; one draw)
- 2011 – 89.6% (43/48 matches correctly predicted; 4 upsets; one draw)
- 2015 – 89.6% (43/48 matches correctly predicted; 5 upsets)
So can we trust them or not?Yes. We can trust them. Just not blindly. While they’re definitely informative, we need to consider how we look at them. While the rank positions are in themselves nice accolades, the ranking points (and in particular relative difference between the points of two teams) are far more useful in understanding the relative strength of different teams. But it’s all the same model, and while it’s not perfect, it’s certainly not ridiculous. In my next post, I’ll be looking at how we can take this a bit further to try and understand the probabilities of match outcomes, and apply a bit of context to the likelihood of some famous upsets.
If I had the equipment and the discipline, this would be a podcast. But also I hate my voice.— Dave.
I’m trying to learn some new stuff, and I thought I’d make myself accountable by talking about some of it publicly.
Maybe it’ll be interesting or useful to other people. If not, perhaps I’ll make myself laugh in the process.