Scraping Codeforces Problems

→ Pay attention

Before contest
Codeforces Round 1006 (Div. 3)
3 days
Register now »

→ Top rated

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

View all →

→ Find user

→ Recent actions

Detailed →

TheeLooser's blog

Scraping Codeforces Problems

By TheeLooser, history, 3 years ago, In English

Is there a way to scrape problem statements automatically?

peepohey

TheeLooser
3 years ago
11

Comments (7)

Show archived | Write comment?

pawarashish564

3 years ago, # |

Using codeforces API — checkout the Problem section.

→ Reply

TheeLooser

3 years ago, # ^ |

Unfortunately, the Problem object does not come with the statement text.

→ Reply

pawarashish564

3 years ago, # ^ |

I am not sure what you are trying to achieve but previously I was working on a similar kind of problem I used beautifulsoup from python to read HTML and parse the content. you can do a similar for your purpose.

→ Reply

TheeLooser

3 years ago, # ^ |

I tried using soup but it doesn't work anymore. I think Codeforces upgraded their systems (currently uses some sort of script to get statements on demand? I know very little about this stuff). In fact, previously you could just use wget to just download a problem page, like https://codeforces.net/problemset/problem/1673/F, to get the raw HTML. This doesn't work anymore. In case I might be missing something trivial, could you please try using soup again – I mean, right now? I think when you did your parsing, a simple wget command would've worked.

→ Reply

Xellos

3 years ago, # ^ |

Download page using the problem id?

→ Reply

TheeLooser

3 years ago, # |

So, a friend of mine looked into it and found out that wget/curl https://codeforces.net/contest/contestId/problems still works, while the problem with wget/curl https://codeforces.net/problemset/problem/contestId/index is that it just gives the preload HTML. So, scraping contest psets instead of individual problems is an alternative. Thanks for your comments.

→ Reply

Avanta

3 years ago, # ^ |

Unfortunately, it seems like this no longer work anymore :/ Did you manage to find any other alternative?

→ Reply