An iterating AI-solving framework may boost o1's cheating ability on Codeforces.

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

I must say that I have no ideas about the details how OpenAI tested o1 model in IOI and Codeforces contests. This framework may not work or they have tried it.

Here are some facts:

o1 performs relatively poor in IOI with 50 tries each.
o1 achieves IOI Gold Medal with 10000 tries each.
o1 only achieves 1600+ rating (far from IOI Gold Medal) on Codeforces.
According to the survey by community (https://codeforces.net/blog/entry/133887), o1 can solve very hard problem (2700) but also fail some very easy problems (800)
Codeforces's rule prohibit o1 from having too many tries.

4 and 5 may be the reason why o1 only achieve 1600 on Codeforces. The difference between IOI Gold and 1807 is, that IOI rules provide a no-cost validation so its final score is max(for each try).

I believe, OpenAI didn't pay much attention to how to conquer the submission limitation of Codeforces. They may also independently generate 50 or 10000 codes. Thus the potential of AI cheating is suppressed and can soon threat to higher rating players.

The point is, is there a way to validate each piece of code without submitting it? YE5.

Any well-trained CPers / OIers may easily come up with their practice in some contests where participants can only submit once. They write a pretest generator, a true but slow brute-force solution and their final solution. Keep comparing the results of both until after a bunch of tests there is a difference or not.

Brute-force is always easier to write, some extremely slow brute-force like exponential algorithms can hardly be wrong. Solving problems iteratively is the common experience of us.

So the simple framework works like this:

generate and validate an exponential solution can pass all given pretests.
generate larger pretest and use the exponential solution to validate newly generated n^2 solution.

...

generate total scale pretest and use previous fast solution to validate final solution.
submit

If it's stuck at step 2 for a long time. The exponential solution is wrong, generate a new one and ask for more human-made pretests. The validation process may consume much time and should be accelerated with multi-threads strategy. Also next stage solutions and be generated and validated parallel.

Rev.	Кто	Когда	Δ	Комментарий
en4	piaoyun	2024-09-16 19:36:25	4	Tiny change: 'Gold and 1807 is, that ' -> 'Gold and 1600 is, that '
en3	piaoyun	2024-09-16 19:36:09	9
en2	piaoyun	2024-09-16 19:07:17	22
en1	piaoyun	2024-09-16 19:06:47	2364	Initial revision (published)

Rev.

Язык

Кто

Когда

Комментарий

en4