Codeforces: Entertaining Statistics

Revision en1, by orz, 2024-09-05 06:25:47

Hello!

Recently I published two posts one, two featuring users that were first, second, etc. to reach some rating value.

The reason I was able to create these posts was that preliminarily I downloaded all users' rating change history via Codeforces API. It took some time, but now, when the job is done, I have more than 700000 files, in each one of them a json is stored which represents one user's full rating change history (as an array, and each change consists of the old rating, the new rating, contest name, link and date, taken place and probably something else). I feel a slight feeling of something left unsaid/undone. Probably, because, even though I built that tables, I did not use 99.9% information at all. That simply means that more can be done now!

Could you please suggest some ways to aggregate this data and get some entertaining results? For instance,

  • get the rating list sorted by maximum rating rather than the actual rating;
  • get the maximum amount of rating gained by someone in two, three, four, five, etc. consecutive contests; or in two, three, four, five first contests taken by the user (though I dislike the idea with the word first as it encourages people to register twin accounts);
  • rank all users by the number of times they defeated tourist;
  • as a development of the previous bullet, for each user their tourist number can be calculated (it is defined similarly to Erdős number: tourist is the only person with zero tourist number, and for all positive integers $$$n$$$ people with tourist number equal to $$$n$$$ are defined such as ones who defeated a person with $$$n-1$$$ tourist number but never defeated anyone with a smaller tourist number);
  • rank all users by the number of rated contests (or rated Div. 1 contests, or rated Div.1 + Div.2 contests) they participated in;
  • rank all users by the numbers of contests which they won / were second / were third;
  • rank all users by the number of contests they participated in (ruban?) / by minus the number of contests they skipped since their first participation;
  • rank all users by the difference of their maximum and the minimum rating / by the maximum difference of some reached rating and some previously reached rating / by the number of colors they have been colored in since their fifth contest (although this encourages stalactites and stalagmites);
  • rank all users by their volatility (e.g. how many times the sign of their rating change differed from the previous one, or how many times they changed their title or color, or what is the standard deviation of their rating, or what is the sum of absolute values of their rating changes, etc.), or find most stable users who participate a lot but still have a very horizontal graph (or a very horizontal segment);
  • find the highest place in which there happened to be a draw in a contest;
  • rank all users by the smallest place they reached;
  • rank all users by the smallest place which they reached and got a negative delta;
  • rank all grandmasters by the number (or minus the number, to discourage twin account creation and encourage hard work) of contests it took them to reach their title;
  • rank all users with the substring orz in their handles;
  • and much more!

Please share your ideas in the comments and upvote/downvote others' ideas, and I'll try to implement the nicest ones of them!

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en3 English orz 2024-09-05 06:35:20 5 Tiny change: 'e defined such as ones w' -> 'e defined as ones w'
en2 English orz 2024-09-05 06:34:25 46
en1 English orz 2024-09-05 06:25:47 3702 Initial revision (published)