Predicting The Men’s World Championship Field…With SCIENCE! (Guest Post by Todd Kozlowski)

Disclaimer, I’m not a statistician. Not even close. I’m so, so sorry to any of you who are for my lack of rigor.

Todd made these predictions using the Elo method. Whatever the hell that means.
 Todd made these predictions using the Elo method. Whatever the hell that means.
The prediction here is based on the commonly-used Elo system, which tracks competitor performance in other sports like baseball, soccer, and chess. I threw together a program to calculate the odds of each competitor reaching a certain round. The percents are given, along with a color-coding (green, >50%, is likely, yellow, between 15% and 50%, is a challenge, and red <15% is unlikely).
Problems with the model: because I haven’t been tracking Elo over the entire season, theses are very crude preliminary provisional scores. I hope to track the specific Elo of each major international fencer in the coming 2014-15 season, and will have a (remarkably better) model by World Champs ’15. Other problems: this model uses a pre-seeded Table of 32 only, and fails to account for victories made by the lower half of the table of 64 into the 32. A quick assumption one can make is that those seeded 33-64 will have even lower numbers than those posted here.
Additionally, because the Elo ratings are based very heavily on FIE points standings, any inherent problems with that system (ie competitors not making events, injuries in the season, and unexpected successes or flops) will be equally notable here.
So here you have it, Todd’s 2014 World Championships Prediction, powered by science!


5 thoughts on “Predicting The Men’s World Championship Field…With SCIENCE! (Guest Post by Todd Kozlowski)

  1. Fantastic! Will be incredibly interested to watch the success in predicting men’s sabre tomorrow.
    A true Elo model for 2014/2015 based on A-grade comp results would be even more interesting. It’d be great to compare how it stacks up against the traditional points-based ranking the FIE is using. We’ve been using Elo at an internal club level for two years now and it’s incredibly powerful. In my experience, 5 matches is enough to get a reasonable fix, so a fencer who does the full world cup/grand prix circuit should be able to be ranked pretty accurately.

  2. Disregarding some criticism of the Elo rating system itself (how are you able to use that in an individual sport (disregard something like chess) when the likes of competing while injured will greatly hurt your rating), how were these Elo ratings actually determined? Were they based purely on the results of DEs, or did the results of poules have a factor? If yes to the latter, why? It seems highly counter-intuitive to me that the rankings (based on FIE points) have a direct representation in the Elo rating. How could this possible happen, if there are numerous ways to gain the same number of FIE points? It’s noted that the specific Elo ratings weren’t tracked for every fencer, but these look like they were made up completely on the spot!

  3. In response to the anonymous commenter above: I’m pretty sure the tables above are simply based on FIE point scores, treated as if they were Elo ratings, then back-calculated to the victory probabilities. OP can correct me, but I’m guessing that’s what’s going on, as it’s definitely directly based on current FIE rank without additional data.
    Elo works fine for individual sports: it should be much more robust in dealing with things like injury absences than a simple cumulative point tally. The main issue is getting enough data points. In our experience at club level, 5-10 match results are enough to get a good fix on a fencer’s level, which means a pro going to every A-grade comp in a season and fencing at least one DE at each will have a reliable ranking. There’ll be a tendency to undervalue the guys who show up once and lose, but they’re not the ones the model is about anyway, to be honest. There’ll also be a tendency to underrate players who are highly erratic in their results, but hey, you can argue that’s perfectly fair.
    The value of building a true Elo model based on DE results over a season would be to actually consider the difficulty of fights which (at least from my observation of men’s sabre) can be absurdly high even in the L64 and L32. To take a random example, someone like Bolade Apithy should get a lot more credit for taking out Kim Junghwan in the L32 in Warsaw, even if he then loses the next round.

  4. Same Anonymous as Anonymous above, looked harder at the data (thanks for the tip, Frances!):
    Looking harder, the Elo ratings are just silly. Here’s how they were calculated:
    Elo rating= 1800+FIE points*3
    That’s it. Provisional ratings simply do not mean “made up with a random formula”. They mean a best guess based on available data.
    From what I can tell, the Elo ratings were first arbitrarily assigned based on that So, using the link here: , we can see that in the case of Men’s Epee, we have the resulting 86.6% for Nikolai Novosjolov in the first round. Hence, in the second round, Novosjolov has 0.866*(0.509*(P16)+0.491*P(17))=0.866*(0.509*0.794+0491*0.799)=0.69, where P(16) and P(17) is the probability for the #1 seed to win against the #16 and #17 seed respectively.
    So an Elo rating obtained from a random formula has been arbitrarily assigned to each fencer, and then a DE table has been built from the top 32 fencers. Ignoring the fact that the world championship does not use such a format, the above tables do not tell us much, if anything, at all. It’s even wrong, because if one person wins against another, their Elo ratings change! It’s a nice proof of concept, but it really shouldn’t be advertised as anything close to a real-world model of the results, which is very confusing to the reader. This shouldn’t be advertised as science.
    On the side note of Elo ratings, there are a few published analyses of Elo ratings discussing how a normal distribution can’t readily be assumed, and there have been reports of people sitting on their rating and in the case of something like chess, “Grandmaster Draws” (to conserve their energy), as well as issues such as rating inflation/deflation. In order to fix even some simple problem, you would have to have a specific Elo rating for each fencer in both poules and DEs, resulting in a lack of a information for DE prediction while poule bouts would be far more easily predicted, although in the case of poules, it becomes confusing because the margin of victory is suddenly important!
    As noted in the post above (Frances’), a true Elo model would consider the difficulty in bouts much more than something like how far someone advanced, but it’s fraught with problems, because we would simply not have enough results over the course of a season (or even over many seasons) for the vast majority of fencers. Also, on a whim, I would guess that the results of individuals in any sport are far more variable than the results of any team.

    1. I don’t doubt anything you’re saying. I just don’t have enough knowledge to debate you on this. Todd is trying to refine his model…

Leave a Reply