Блог пользователя cfsearch

Автор cfsearch, история, 3 года назад, По-английски

I recently scraped almost all of the submissions from Codeforces. Here I share all the source code and metadata (problem ID, submitter, language, verdict, etc.): https://mega.nz/folder/Sypi0BrS#iNbQXf3EwcjZbpwXRKHOnQ. The dataset contains at least 99.8% of the public submissions with ID <= 128M. In total, there are ~98M submissions.

In addition, I created a source code reverse search engine based on this dataset, which you can access at https://cfsearch.top/.

Disclaimer: The scraping process violates Codeforces' Robots.txt. Use of this dataset may even violate Codeforces' terms. Use it at your own risk.

Btw, MikeMirzayanov, is it possible to share the official dataset?

  • Проголосовать: нравится
  • +95
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится +20 Проголосовать: не нравится

Wow. Amazing.

How much time did it take you to scrap this?

»
3 года назад, # |
  Проголосовать: нравится +19 Проголосовать: не нравится

Thanks for this dataset!

The code search could be extremely useful if polished up. For example if I want to practice say link cut tree problems, I can search for every submission with the words "link cut tree" or "LCT" to find relevant problems with reference submissions/implementations. These are otherwise really hard to find because those problems often have have alternative solutions that don't use advanced data structures (but require more insights to find) so you can't just sort by execution time.

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Great tool. It can be used to find alt accounts of users based on the templates they use.

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Hey, it looks like the website is down... Would you like to host the website again? Or if not, would you like to share the source code of the reverse search so that we can host it? Thanks a lot!

»
20 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I wanted this data, any other way to find it?

»
8 часов назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

https://cfsearch.top/ doesnt work anymore

  • »
    »
    116 минут назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    yes