It's long been known that certain rating systems, namely Glicko-2 and Topcoder, are not monotonic. In other words, there are cases where losing can eventually result in a higher rating. We wanted to know just how severe the issue can be. In joint work with inutard at WWW 2021, we computed how tourist's rating would evolve according to both Topcoder and our custom rating system. The dataset consists of Codeforces rounds up to Looksery Cup 2015, accessed via the Codeforces API. Here, we see that tourist's Topcoder rating is 3284, but could have been as high as 3807 if he were willing to lose on purpose!
More details on the adversarial strategy: for his first 45 rounds, we simulate tourist playing normally, following historical data. In the next 45 rounds, he purposely becomes last place whenever his Topcoder rating is above 2975, but plays normally otherwise. Then finally, he returns to playing normally for an additional 15 rounds.
A similar strategy recently broke the Pokemon Go Battle League rankings, which seem to be based on Glicko-2: https://www.reddit.com/r/TheSilphRoad/comments/hwff2d/farming_volatility_how_a_major_flaw_in_a/.
Does it work on lichess?
If it uses Glicko-2 then I suspect the same exploit will work. The trick is to massively inflate your volatility by alternating between losing and winning.
Does Codeforces uses Glicko-2(or similar) too?
The Codeforces system is monotonic: https://codeforces.net/blog/entry/20762
Although I think Elo-MMR produces nicer ratings, both systems are free from such exploits! Glicko-1 should also be monotonic I think, though I haven't verified it.
I raised the issue on the Github for Lila (the implementation of Lichess). However the maintainers dismissed it. https://github.com/ornicar/lila/issues/7862
I don't think so. When you reach your_normal_rating + 100, your opponents will crush you and will not let you climb higher. And weak opponents (your_normal_rating — 300) will not accept your challenges.
It's easier to get this your_normal_rating + 100 with fluctuations.
I'll just try it myself.
https://lichess.org/@/VolatilePlayer
Most of players who played a lot of games have 45-46 deviation. I'll stay on 2100 rating until I reach 45 deviation, then I'll start playing full strength. Let's see if I will overcome my typical 2350-2400. I think not.
I would be very curious too! Lichess uses Glicko-2: https://i.imgur.com/bOjm17e.png
But given how many players are on the platform, it would be weird if they haven't hacked a fix for this attack.
Please feel free to email / message us your results! We can include it in our repo and credit you :)
Lichess has easy API and open source code, you can actually experiment with other players' results.
Also, to get maximum profit, I should return to this account after some time (maybe 1 year), when the volatility greatly increases and I'll be getting +100 for my first games. But that's not the case of your study.
rainboy for topcoder 2021.
You theoretically broke Topcoder ratings. It's still a long way to actually see this happen.
This blog seems really informative. I just have a small question and hope you won't be offended. Why did you make an effort into this? I mean, was this some sort of academic research or just something you're passionate about? I ask because your work seems genuine and tough and I've never had a drive to do something similar.... Once again, I don't mean any offense.
It started as a fun project a few years ago, out of a curiosity to see whether good theoretical foundations would solve some of the issues with programming contest rating systems. The more recent work was undertaken to turn that project into an academic publication. Hope this helps!
Well, it is kind of a good measure of reliability of rating system.