Codeforces submission dataset and source code reverse search engine

→ Pay attention

Before contest
Codeforces Round 1006 (Div. 3)
44:15:27
Register now »

→ Top rated

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	146

View all →

→ Find user

→ Recent actions

Detailed →

cfsearch's blog

Codeforces submission dataset and source code reverse search engine

By cfsearch, history, 3 years ago, In English

I recently scraped almost all of the submissions from Codeforces. Here I share all the source code and metadata (problem ID, submitter, language, verdict, etc.): https://mega.nz/folder/Sypi0BrS#iNbQXf3EwcjZbpwXRKHOnQ. The dataset contains at least 99.8% of the public submissions with ID <= 128M. In total, there are ~98M submissions.

In addition, I created a source code reverse search engine based on this dataset, which you can access at https://cfsearch.top/.

Disclaimer: The scraping process violates Codeforces' Robots.txt. Use of this dataset may even violate Codeforces' terms. Use it at your own risk.

Btw, MikeMirzayanov, is it possible to share the official dataset?

#codeforces, #webscrapping

cfsearch
3 years ago
11

Comments (8)

Show archived | Write comment?

purplesyringa

3 years ago, # |

+20

Wow. Amazing.

How much time did it take you to scrap this?

→ Reply

cfsearch

3 years ago, # ^ |

~ 2 weeks

→ Reply

Kyou_mo_kawaii

3 years ago, # |

+19

Thanks for this dataset!

The code search could be extremely useful if polished up. For example if I want to practice say link cut tree problems, I can search for every submission with the words "link cut tree" or "LCT" to find relevant problems with reference submissions/implementations. These are otherwise really hard to find because those problems often have have alternative solutions that don't use advanced data structures (but require more insights to find) so you can't just sort by execution time.

→ Reply

cfsearch

3 years ago, # ^ |

Good idea. But the current search engine cannot handle such requirements :(

→ Reply

YPK

3 years ago, # |

Great tool. It can be used to find alt accounts of users based on the templates they use.

→ Reply

Nea1

2 years ago, # |

Hey, it looks like the website is down... Would you like to host the website again? Or if not, would you like to share the source code of the reverse search so that we can host it? Thanks a lot!

→ Reply

Dio707

20 months ago, # |

I wanted this data, any other way to find it?

→ Reply

cum00

4 hours ago, # |

https://cfsearch.top/ doesnt work anymore

→ Reply