cfsearch's blog

By cfsearch, history, 3 years ago, In English

I recently scraped almost all of the submissions from Codeforces. Here I share all the source code and metadata (problem ID, submitter, language, verdict, etc.): https://mega.nz/folder/Sypi0BrS#iNbQXf3EwcjZbpwXRKHOnQ. The dataset contains at least 99.8% of the public submissions with ID <= 128M. In total, there are ~98M submissions.

In addition, I created a source code reverse search engine based on this dataset, which you can access at https://cfsearch.top/.

Disclaimer: The scraping process violates Codeforces' Robots.txt. Use of this dataset may even violate Codeforces' terms. Use it at your own risk.

Btw, MikeMirzayanov, is it possible to share the official dataset?

  • Vote: I like it
  • +95
  • Vote: I do not like it

| Write comment?
»
3 years ago, # |
  Vote: I like it +20 Vote: I do not like it

Wow. Amazing.

How much time did it take you to scrap this?

»
3 years ago, # |
  Vote: I like it +19 Vote: I do not like it

Thanks for this dataset!

The code search could be extremely useful if polished up. For example if I want to practice say link cut tree problems, I can search for every submission with the words "link cut tree" or "LCT" to find relevant problems with reference submissions/implementations. These are otherwise really hard to find because those problems often have have alternative solutions that don't use advanced data structures (but require more insights to find) so you can't just sort by execution time.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    Good idea. But the current search engine cannot handle such requirements :(

»
3 years ago, # |
  Vote: I like it +8 Vote: I do not like it

Great tool. It can be used to find alt accounts of users based on the templates they use.

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Hey, it looks like the website is down... Would you like to host the website again? Or if not, would you like to share the source code of the reverse search so that we can host it? Thanks a lot!

»
20 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I wanted this data, any other way to find it?

»
4 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

https://cfsearch.top/ doesnt work anymore