An iterating AI-solving framework may boost o1's cheating ability on Codeforces.

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

I must say that I have no ideas about the details how OpenAI tested o1 model in IOI and Codeforces contests. This framework may not work or they have tried it.

Here are some facts:

o1 performs relatively poor in IOI with 50 tries each.
o1 achieves IOI Gold Medal with 10000 tries each.
o1 only achieves 1807 rating (far from IOI Gold Medal) on Codeforces.
According to the survey by community (https://codeforces.net/blog/entry/133887), o1 can solve very hard problem (2700) but also fail some very easy problems (800)
Codeforces's rule prohibit o1 from having too many tries.

4 and 5 may be the reason why o1 only achieve 1807 on Codeforces. The difference between IOI Gold and 1807 is, that IOI rules provide a no-cost validation so its final score is max(for each try).

I believe, OpenAI didn't pay much attention to how to conquer the submission limitation of Codeforces. They may also independently generate 50 or 10000 codes. Thus the potential of AI cheating is suppressed and can soon threat to higher rating players.

The point is, is there a way to validate each piece of code without submitting it? YE5.

Any well-trained CPers / OIers may easily come up with their practice in some contests where participants can only submit once. They write a pretest generator, a true but slow brute-force solution and their final solution. Keep comparing the results of both until after a bunch of tests there is a difference or not.

Brute-force is always easier to write, some extremely slow brute-force like exponential algorithms can hardly be wrong. Solving problems iteratively is the common experience of us.

So the simple framework works like this:

generate and validate an exponential solution can pass all given pretests.
generate larger pretest and use the exponential solution to validate newly generated n^2 solution.

...

generate total scale pretest and use previous fast solution to validate final solution.
submit

If it's stuck at step 2 for a long time. The exponential solution is wrong, generate a new one and ask for more human-made pretests. The validation process may consume much time and should be accelerated with multi-threads strategy. Also next stage solutions and be generated and validated parallel.

Rev.	By	When	Δ	Comment
en4	piaoyun	2024-09-16 19:36:25	4	Tiny change: 'Gold and 1807 is, that ' -> 'Gold and 1600 is, that '
en3	piaoyun	2024-09-16 19:36:09	9
en2	piaoyun	2024-09-16 19:07:17	22
en1	piaoyun	2024-09-16 19:06:47	2364	Initial revision (published)

Rev.

Lang.

When

Comment

en4