How to improve the system for identifying AI cheaters on Codeforces?

#	User	Rating
1	jiangly	3845
2	tourist	3798
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3589
6	Ormlis	3532
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	Um_nik	3450

#	User	Contrib.
1	cry	165
2	Qingyu	160
3	-is-this-fft-	159
4	atcoder_official	157
5	Dominater069	156
6	adamant	154
7	djm03178	151
8	luogu_official	149
9	awoo	148
10	TheScrasse	145

Hello dear users and MikeMirzayanov!

Lately I've often come across discussions about the use of AI in competitions. I decided to share my thoughts and personal experience on this issue, as well as make some suggestions that could help to fight the problem of foul play more effectively.

As an experiment, I tested several ChatGPT models on a paid basis to see how well they are able to efficiently solve contests on different algorithms in terms of correct solutions and optimization.

o3-mini copes well with div2 A-C level problems, but starting from div2 D the model without an explicitly defined solution idea often generates suboptimal code, incorrectly defining the optimal asymptotics of the algorithm. Problems are especially noticeable in dynamic programming problems when composing the recurrence formula and in interactive problems.
o3-mini-high shows better understanding of the problems, deeper reasoning, but also makes mistakes on complex div2 E-F problems, especially those related to dynamic programming. Often, to get the correct and optimal solution, we have to manually describe the model algorithm with the exact formula and asymptotics.

In my experience, of course, for purely experimental and scientific purposes, I noticed one interesting thing — ChatGPT in standard conditions without detailed explanations of the algorithm usually generates the same code pattern, containing approximately the same structure of functions and their names, similar style of variables and repeated comments, frequent use of the same blanks and templates, especially for typical algorithms. Yes, of course an ordinary user can write a similar algorithm, but there will be a clear difference from AI in code layout and code structure.

This, in my opinion, can be used as one of the layers of protection for automatic moderation of solutions on the platform. Already now, moderation on Codeforces sometimes successfully detects suspiciously similar solutions on this very basis. But it is actually very easy to circumvent this now by obfuscating the code, adding “garbage” and unnecessary functions, because of which the code will be difficult to read in the end. You can think of several more layers of protection that will work in parallel with the main one.

The basic idea is behavioral analysis of user actions in real-time. The following functionality can be implemented:

Real-time analysis of the difference between the moment of opening the task condition and the first sending of the solution
Tracking sudden changes in the speed of problem solving, for example, solving a complex problem too quickly after a long idle time without activity.
Fixing absolutely all user actions on the page of the contest with batch sending of encrypted data to the server for the anti-fraud module, without which further action on the site will be impossible, we will take into account especially carefully the events of copying the task condition from the page.

A separate and key solution will be the implementation of a client application Client <-> Server. Its main task will be monitoring of suspicious actions. In some ways its logic will be similar to the work of a proctor when conducting, for example, an online screener.

Analyzing processes running during the contest
Analyzing incoming/outgoing traffic to control requests to AI services
Creating and sending random screenshots of the user's screen to the server only when there is suspicious activity

Of course, from a privacy point of view, this solution would not be a good one, so it's worth thinking more about how to implement it properly.

Write your thoughts and ideas in comments, it will be interesting to hear!

Comments (9)

Write comment?

AksLolCoding

8 hours ago, # |

← Rev. 2 →

AI cheating is extremely difficult for normal algorithms to catch. Maybe we should fight AI with AI and have AI-based plagiarism detectors. They already exist and work for writing (many schools and colleges use them), and there is no reason they would not be able to be work for coding.

Example: ChatGPT AI code detector

→ Reply

Sunb1m

7 hours ago, # ^ |

This sounds like an idea, I didn't think of it right away, but again, only the most obvious solutions will be detected. The code can be modified to fit your own unique template and then you will have to add additional security measures

Forcing people to install a lockdown app that tracks everything they do violates way too much privacy, especially since the EU exists

If I was forced to do that, I would never join an official contest again

You are right, there are a few legal issues that need to be resolved for this to work. This is how most anti-cheats in games work, but on the other hand, the reliability and level of protection is increased. By the way, this chatbot that you threw to me can in theory be set up against it and used to refine the code against detection

If you rewrite AI code from scratch it will be 100% undetectable

So all we can catch are the lazy cheaters

nik_exists

6 hours ago, # ^ |

Those detectors for AI-written work have lots of false positives and I wouldn't trust them a ton.

The same issue can be found with AI generated code, while there are definitely signs of AI generated code (which are admittedly more often found in code than writing), there's no surefire way of detecting AI generated code. On a side tangent, as people start to mask their AI generated code better, its going to be harder to tell if someone is using AI, and conversely, its going to lead to witch hunting people based on little evidence (I've already been accused of using AI for having too descriptive variable names :| ).

One last note, I'm not too familiar with how AI detectors work, but I can tell you that using an LLM (like the example provided) is not a great way of detecting AI generated code (as an example, I've pasted 2 straight from GPT codes into the example provided, and it didn't detect either of them.)

CRACKA

5 hours ago, # ^ |

i mean usually llm text is watermarked meaning it can easily be detected i am not sure if the code has the same thing , so maybe you can somehow check for it

How is it watermarked? I don't think that this is the case (at least for must public models)

https://youtu.be/55LXhbPjyeM?si=Nv8uLDd64SHei1yK , you can watch this youtube video for information , but in general i think its watermarked , cause this is how anti chatgpt tools detect cheating in assignments and online exams

let me know what you think

Sunb1m's blog