Codeforces submission dataset and source code reverse search engine

Revision en1, by cfsearch, 2021-09-10 19:38:33

I recently scraped almost all of the submissions from Codeforces. Here I share all the source code and metadata (problem ID, submitter, language, verdict, etc.): https://mega.nz/folder/Sypi0BrS#iNbQXf3EwcjZbpwXRKHOnQ. The dataset contains at least 99.8% of the public submissions with ID <= 128M. In total, there are ~98M submissions.

In addition, I created a source code reverse search engine based on this dataset, which you can access at https://cfsearch.top/.

Disclaimer: The scraping process violates Codeforces' Robots.txt. Use of this dataset may even violate Codeforces' terms. Use it at your own risk.

Btw, MikeMirzayanov, is it possible to share the official dataset?

Tags #codeforces, #webscrapping

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en1 English cfsearch 2021-09-10 19:38:33 854 Initial revision (published)