In this post I'll try to expose the results of my research regarding the rating formula. More specifically, I'll show a graph that gives you a rating according to your performance and I'll show the basic aspects of how ratings are updated. This post is quite large, if you are only interested in the results you can easily skim through it :)
Disclaimer: I'll try to support my assumptions and assertions with data but, altough I have done the analysis carefully, I may have done a mistake :) Moreover, take into account that some of my formulas are approximations.
Motivation
Almost two years ago I started to participate in contests more actively, aiming to be red one day. To increase my motivation I tried to find a formula that evaluated how well I did during a competition. At that time there were no API nor [user:DmtryH]'s statistics so I manually looked at a lot of profiles to try to find 'stable' points: users that didn't change their rating after a competition. Supposing that, in that case, they performed as they were expected; I obtained two approximate linear formulas for Div-1 contests:
The 1st formula was a linear approximation of a non-linear function, it started to fail roughly for red performances. The second one works pretty bad, but it can give you an idea of how much it will increase. With this I did an Excel like this one:
As you can see the colours in the left look much more interesting than the colours on the right. For instance, in one contest my rating was >2500, which (obviously) gave me a boost in motivation; next contest I performed 1000 points lower; which surprisingly motivated me as well to do a Virtual Contest and get back on track.
As months progressed I saw that this formula was helping me and could help others as well, so, as an extra motivation, I promised myself that when I first turned red I would do a serious post about it (and also write a CF contest for both divisions (separately), if they allow me :) ). I may not be red when you read this post; but,anyway, here it goes!
Theoretical analysis
ELO system and expected rank
The basis of my work is MikeMirzayanov's blog, along with some others. In CF, the probability that A wins B in terms of their rating is:
The theory of the ELO system suggests that, after a match, ratings of A and B should be updated by a constant multiplied by the difference between the expected result and the actual result.
Now, if we run all the possible matches between the contestants and add the probabilities of winning against every competitor we get the expected rank (+0.5 because ranks start with 1 and the probability of winning against yourself is 0.5):
Unable to parse markup [type=CF_TEX]
This formula also shows that (obviously) your expected rank does not only depend on your rating, but also on your competitors. However, as I'll later show, for most participants this will not matter. Another thing that comes out of it is that the relative position of your competitors will not affect your rating: in other words, it doesn't matter if you beat a red and a blue beats you or you beat a blue and the red beats you.
Notice that expected percentile is a very similar formula dividing by the number of participants and subtracting 1 or not depending on if you count yourself.
Volatility and other assumptions
Topcoder uses similar ideas, but also has volatility assigned to each user. Looking at worse's or dreamoon_love_AA's profiles made me think there was not such thing in CF (because their ratings movements have no acceleration). I'll try to show more evidence of this below.
In my analysis I'll also assume that the rating formulas didn't change (which probably did, but not much).
Results
Codeforces API
To obtain the exact formulas I needed to obtain much more data; thus, I turned to Codeforces API (thanks a lot Codeforces admins for creating it!) I had no idea about APIs and almost no idea about python, but I came accros soon's blog, which was incredibly useful; you should check it out!
With this I wrote a code to obtain the Rank, OldRating, NewRating (and other data) for every user competing in a contest and then analyzed the results on Matlab.
One of my objectives was to find the formula to update ratings. In Data Analysis and Machine Learning one must separate part of the data to test if the obtained model (using 'training data') can predict new data. Thus, the data has been obtained using slightly old contests (350 to 450, we're at 550 now).
Rating distribution of users in a contest
Here are the rating distributions for the 3 types of contests. Rating 0 is for new participants. I have averaged several contest, but they don't change much from contest to contests. Finally note that they are not equivalent to rating distributions of all users in CodeForces since some users with some ratings participate more often than others.
Rating Inflation
Accidentally I also found what I believe to be evidence of rating inflation. Notice this is slightly old data, it's probably better now :) This is the mean rating of participants in Div1-contests:
Evaluating the performance in a contest
Now that we have the rating distributions, and we know they don't change much, we can relate percentile to an equivalent rating using the formulas from theory. For a given percentile, its equivalent rating is the rating that (according to the formulas) would give that percentile as the expected one.
The following graphs are very exact for the linear section in the middle (where the Law of big numbers applies and every contest is roughly the same). Variations are larger on the extremes: for instance, the equivalent rating of the first places is greatly affected by tourist's or Petr's presence.