How to save Codeforces from AI-assisted cheating as AI models evolve so rapidly?

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Inspired by the comment of hxu10 at https://codeforces.net/blog/entry/133874?#comment-1197073

I think this is an important topic as it impacts the very existence of the whole online competitive programming communities (like Codeforces). So, I open a new thread.

Back in my university days around 2019, when I was actively competing in Codeforces contests, I never imagined that AI would advance so quickly that it could solve difficult competitive programming problems.

Open AI's new model claims to achieve 1800+ rating. I would assume in the near future, AI could achieve 4000+ rating and beat tourist. Although I'll mark this day as the day when AGI comes, it will pose an existential threat to Codeforces!

Also using Go as example. After AI performed better than every human, online Go competition effectively collapsed. Everyone can use AI to cheat. An unknown contestant who suddenly performs really well will be challenged on whether they are cheating using AI.

But the situation of competitive programming will be more dire after AI keeps improving it competitive programming capability. Reasons:

Cheating in a two-player game like Go only affects one opponent while cheating in a Codeforces contest, however, undermines the entire leaderboard and harms every participant.
In-person Go contests are still alive. However, due to the nature of competitive programming, with its smaller and dispersed community, there are almost no in-person Codeforces equivalents. OI and ICPC are only for students.

Actually I have no ideas that can solve this issue. Here're some bad ideas with profound limitations:

Signing Term of Agreement when registering contests which commits not to use AI. Limitation: will not be effective.
Mandate screen-sharing (and even camera-on) during contests to prevent cheating. Drawback: privacy concerns and high costs (to both Codeforces itself and users).

UPD: Thanks all for your replies! After reading the replies, I finally get a useful idea.

We all think that even if AI gets smarter than us, we can still have fun doing Codeforces. It's a great way to get better at solving programming problems, or just to feel good about tackling tough challenges.

But the Codeforces rating system could become broken. So, I propose that we can have two separate rating systems.

Virtual rating: applies to all users.
Verified-human rating: only those who participate in onsite contest and perform on Codeforces online contests at a similar level as they perform onsite will get the verified rating on Codeforces.

For example, if a user performs at rating 2000 in a onsite contest (that forbids digital devices and Internet), and 1 week later he performs rating 2200 in a Codeforces rating round, then the rating 2200 can be considered valid and both the two ratings (virtual rating & verified-human rating) can be upgraded.

However, if the user performs at rating 3000 just 1 week after the onsite contest, then only virtual rating is upgraded, while the verified-human rating will not be upgraded (just similar to the current out-of-competition mechanism to prevent double account rating abusers). Only after the user performs at least 2800 in the next onsite contest, the 3000-point Codeforces performance can be trusted and the verified-human rating can be upgraded.

That means, if a user never takes part in onsite contest, they will only get the virtual rating. This is enough if the user only cares about their personal growth and not the public recognition. However, many of us still want to climb the verified-human rating leaderboard. This requires more onsite contests to get more users verified.

OI and ICPC are age-limited. And Google Code Jam is dead. Even if it were still alive, the onsite round only covers 25 participants a year. We need much larger scopes than that.

So, Codeforces might need to partner with OpenAI and let OpenAI sponsors onsite contests.

Given that OpenAI has already used the Codeforces platform and CP problem sets and submissions to train models, there's an ethical argument to be made that the company has a responsibility to support the continued growth and vitality of the CP community. Competitive programmers including the problem setters and participants, driven by their unwavering passion and years of tireless effort, have built up this community with high-quality data.

If OpenAI trains their models by the data provided by Codeforces and then their super-intelligent AI kills Codeforces, it will be unacceptable, right?

That will also be a win-win for OpenAI. OpenAI can use the onsite contests to advertise their new models on problem solving skills by competing with humans.

For the ethnics, there is a similar case in journalism. Many journalists feared that the traffic to their media's websites will be impacted by ChatGPT. Because OpenAI uses high-quality text written by professional journalists to train GPTs, it is under pressure to pay back to the journalism industry. And indeed OpenAI has already partnered with some media companies.

Комментарии (73)

Написать комментарий?

di_z

2 месяца назад, # |

Auto comment: topic has been updated by di_z (previous revision, new revision, compare).

→ Ответить

mark

2 месяца назад, # ^ |

+25

Tagging onto this comment for more visibility. Maybe it's possible for Codeforces to partner with OpenAI? Admins would disclose problem statements to them in advance, and closely related queries would be censored for the duration of the contest. For all the useful data CF has provided and will continue to provide to the benefit of their projects, it doesn't seem like too unreasonable an ask.

Assuming they agree... This wouldn't be a perfect solution, especially considering it can't apply to locally hosted LLMs and the like, which will catch up eventually. However, I think it would be fairly uncontroversial (not a major privacy violation) and hopefully buy enough time for CP rules & culture to adapt to this sudden upheaval.

34z12000

-12

Dude, we have the same comments but yours is upvoted and mine is downvoted. How's that?

← Rev. 2 →

I agree with you on the partnership between Codeforces and OpenAI as OpenAI has been heavily relying on Codeforces platform and data to train their newest models.

I also came up with a solution and updated the article. It is to let OpenAI and other AI companies sponsor more onsite contests to keep CP community thrive. With more onsite contests, we can verify the Codeforces ratings by onsite performance.

nibbanibba

← Rev. 4 →

-8

Monikanna

+48

Why is this blog being downvoted? Obviously the proposed solutions are not acceptable but it doesn't hide the fact that this is a real problem, especially considering a 1650 rated model is publicly available.

I, for one, would like there to be more in person competitive programming competitions. That would be a lot more fun and a global rating system can still be maintained in that case. Codeforces then would be a place for discussion and for problems from more irl contests than just informatic olympiads and ICPC stuff.

zfnu

-114

Downvoted the blog because restricting technology just because you are afraid of it is retarded. It should be possible to use models in competitions.

However, the main problem of using chatgpt\claude specifically is the zero-effort copy-paste approach and how easy the models are to access. But if someone has a model running locally on his own hardware then it's fine, because this person actually put in some effort before the contest, it's like preparing your own algos library basically.

What can be done to avoid zero-effort usage:

prevent easy copy-pasting of the problem statement using html\css\js\image overlay\whatever
prepare better problems that are gpt-resistant
randomly inject an invisible prompt inside of the problem statement that looks like a paragraph break, so when the participant just copy-pastes it without thinking it's reflected in the generated code and can be used to trace and ban participants for cheating

FeiWuLiuZiao

+43

You use Codeforces to proof and improve your own competitive programming ability, not your computers' and not your ability to make an AI.

Invisible prompt is good btw

Yeah by this logic should also ban usage of autocomplete\syntax highlighting, local testing and compilation since all of those are done by a computer. Just type the code in the submit window without syntax highlighting and execute it in your head on example tests, otherwise it's not competitive enough.

Except autocomplete, all IDE can do all things above, and these are allowed to use for years, and the history of CP have proofed that it won't affect the competitive of competitive programming

Doesn't matter what is allowed or not, rules only work if they are enforceable. Basic prompt engineering makes AI generated code undetectable. And as for the competitive side, well, meatbags can cope all they want but they are competing against machine intelligence now and with how fast models are progressing it doesn't look so good for the organics.

piaolianggg

If you have good reasons to be afraid of something, restricting it makes sense.

There is a high chance AI will bring about the end of the world. Especially if we're not very very careful.

monaxia

+20

The fact that this comment was written by an user with Kurisu pfp is so funny to me

Hey, why do you have a picture of my girlfriend on your cf profile, thats creepy.

WORTH

This might not be a feasible solution, but there can be an application like safe exam browser, and on that we can restrict the websites we can visit.

.-__-.

+47

You could just use another device.

Maybe ask some suspicious users how they came up with the idea in the code, or ask them to participate a round with camera opened and screen shared.

Crmf

+46

How about doing nothing? I think it's vanity of acquiring colors that is being damaged by any kind of cheaters. Inflation of rating makes you feel like something is taken from you, as if it wasn't just numbers to begin with (no one can ever steal your intelligence though)

CF is a platform with the greatest collection of problems to get better -- a lot of people won't ever use it properly and instead cheat for ego -- isn't it just how life is anyway?

NOTHING_2_LOSE

+19

Then ranking in a contest will be useless. Codeforces calculates your rating by your ranking, so if, rating will be useless. That will be a big problem.

LooeyDooey866

I agree. Cheating using telegram groups/AI would simply just be you passing on the task of thinking for someone else to do, which won't benefit you or your problem-solving capability at all. You're just wasting your precious time on this website being dishonest and irritating to others. There is no secret route to mastery, you just have to work really hard.

jianhe

I totally agree.We use codeforces to improve our coding skills,not to practice using AI.

However,I think that the codes of AI are similar.Maybe we can have some ways to find the cheaters.

As the same time,I don't clearly know the policy of Atcoder.Could codeforces immitate it?

whopassby

the problem is that,if so your rating of CF will be useless,then you cant use it to find out whether you are progressing

revper

-9

I feel just disabling copy & paste during the contest will help us a lot.

Mindeveloped

For Go or chess online contests they did something called a accuracy check, basically means they assume no one will ever be able to play as well as AI, and if you play as nearly well as AI (a.k.a. your accuracy is too high) you get banned for that. IMO this is barely a real solution.

Weiweiweii

Is that the end for CP?

shinzenko

This solution for chess is viable because engines have surpassed humans there, so ofcourse playing at such level is suspicious but in codeforces it can solve till 1800-2000, there are ppl above that rating so implementing accuracy check might hamper ppl with higher rating since they are better than AI.

CodeFurces

+153

Competitive programming (especially online contests) is not even a competition between two people like Go. It is just a competition between the problem setter and the participant. Maybe "competition" here is also inaccurate, because the goal of the problem setter is not to prevent the participant from solving the problem, but to accurately measure the ability of the participant by carefully constructing the problems.

If AI can beat tourist, I would image online competitive programming platforms to be in another form.

For example, if AI can achieve a rating of 4000+, then it is very likely that AI can propose unique problems. At that time we can have infinite contests to solve. Have a free afternoon and want to do a Div. 1 contest? No problem. AI can generate one contest specially for you and update your rating according to your performance. Want to cheat for a higher rating? I would assume AI at that level can recognize who is cheating according to their history performance and such. There is no one to "compete with". It is just between you and AI.

"Online" "competitive" programming may die, but onsite contests and competitive programming as a hobby will thrive.

Very interesting to see how the Competitive Programming scene will be in the future, but I doubt using AI to generate new and unique problems will not be sustainable, as LLMs can't create new ideas and such.

if LLM can create new problems,they may have ability of reasoning and innovation,which means of new era of human science

compared with this,the end of online competetive programing contest is acceptable

I have an idea, though I have no idea how feasible it is in practice. Maybe there's a chance it's possible to cross reference the solutions submitted by a person to an AI-generated solution, and see if there's a resemblance of their implementation styles/techniques? I guess this would only work if the AI generates similar solutions to the same problem.

Citypop

I never thought cheating below div2 could cause any big harm except requiring codeforces team spending more time on those tedious plagiarism check. The real problem I believe is when AI can solve div1/2 problems, reading AI generated code can give experienced contestants direct hints on how to approach them. I can easily turn this into my own solution without any cheating evidence left. It's equivalent to implementing a solution after reading editorial.

cockatooo

+90

I mean if AI can reach even GM, that probably already means the advent of AGI. At that point almost all non-physical jobs in the world could be replaced by AI, the problem of cf cheating would no longer be important.

0npata

I don't see any solutions either. I think making contests unrated for a while before a consensus is reached how to deal with this might be a good move.

tickbird

← Rev. 3 →

I never could have predicted this would happen two years ago. I'm seriously thinking there will be AI bot beating tourist in the near future. as exactly same thing happened to Chess and Go.

muminurfahim

That already happened at LeetCode: Weekly-406

TwentyOneHundredOrBust

-10

Using AI is not cheating according to the rules. I used AI recently.

ujjwalcomputerpro1

Taking a little help in bug places might be ok sometimes. But today AI is powerful in giving you the solution to your question. If everyone can ask the AI the question, what is the meaning of a live contest there?

Even we should not take any help from AI during the contest. If any bug comes we should tackle it by ourselves because the ultimate goal of giving contest is to be a better version of ourselves in problem solving.

It should be sort of like correspondence chess, where computers were used for a while and humans still added enough human brainpower to gain wins against pure computer play. (Now this is no longer the case, because the computers are too powerful, which would be analogous to having AI being red, at which point it's probably AGI and we have no idea what that world will look like.)

I used AI to solve entirely the easy problems on a contest after barely reading them. But when it gets to the hard problems, it rots your brain when you try to understand its incorrect solution and sort of pollutes the thought process, so it may actually be a net negative.

I think that in the real world you will use AI a lot to write code and develop things. We shouldn't be artificially restricting the tool-space of competition. Otherwise we might as well force people to use the same editor and ban prewritten code.

123gjweq2

+18

Spoiler

entropy07

+13

So $$$AI=0$$$. :)

mrxaid

My opinion on this is that, the contest ratings will become complete obsolete and recruiters stop using them as a way to judge person's capability, as on-site interviews are surely monitored so a person who will cheat will gain nothing, originally most cpers here were to enhance their problem solving skills rather than boasting rating (many were for that as well), but i guess we gotta say good bye to rating system, as this AI will improve by many folds in upcoming years, Also one of my opinion is on the compute, this AI kinds of brute force the solution, and i am sure it needs a lot compute for that, and hence the pricing for this AI will be huge, even gpt-4 pro sometimes doesn't respond because of busy servers, so don't your think that this thinking ai, that generate thousands of intermediate solutions will take a lot of computation resources. Maybe atleast for upcoming some contests they should be unrated, till some people figure out a solution for this.

So how should a recruiter filter out 100 candidates from a bunch of lists?

+15

import random

alpha_beta20

lollll(*insert crying emojis)

Monogon

Current AI systems are still somewhat limited and I'm not sure how much they will disrupt contests in the short term.

But thinking about long term, let's say that in the future, ChatGPT or another tool can solve GM level problems on the first attempt consistently, quickly, and cheaply. In this case, the nature of programming itself is just dramatically different and I think we have no option but to adapt to it in the way that we set problems.

If we instead change the rules to forbid AI usage in contests, we will completely fail to detect cheating. It's fundamentally an impossible problem to solve, as you can always create some workaround by prompting it differently or rewriting its output. It's not like the situation in chess where there's a small, finite number of good moves. The space of possible correct codes is much larger, and in fact infinite if it weren't for the character limit.

So I think going forward, we will have to analyze what are the current limitations of the state of the art models, so that solving the problems we set requires some element of human ingenuity. Maybe we won't be writing the code ourselves, but we'll at least need to come up with some key ideas/observations to prompt the AI.

This is where we have an advantage over chess. Chess is a rigid game with an unchanging set of rules. But in CP every contest is unique. So if the game gets easier because of new tools, we can make the game harder. Only time will tell what the future of problem setting will look like.

VLamarca

I think at the point that AI's can solve "GM level problems on the first attempt consistently, quickly, and cheaply" I find it hard to not expect it to also solve LGM level problems too as well as advancing math and physics research by centuries.

My personal belief is that this does not happen within 5 years with 80% of probability.

It can already solve IOI type problems with a little bit more time. What's the large difference between that and GM problems? I think unless strong regulation is imposed on AI companies, it can easily happen within the next two years.

I think the current state does not fit the description of “solving GM level problems consistently”

Who is saying that? My point is that if it's able to achieve gold at the IOI now with a bit more compute, why do you expect it will take more than 5 years to reach GM performance with less compute? Feels like a made up number to me.

It certainly is a made up number. But I said we are still not in the level that it solves GM problems consistently. If that is reached, I expect many changes even beyond CP. Then you implied that it is already at this level because of the performance at IOI, which I dont think this is the level I described. I think it takes more than 5 years because of the architecture of LLMs being limited. This is also based on opinions of many people I follow. But Im certainly not an expert.

I wasn't trying to imply it. You misunderstood what I was saying. I was just questioning your prediction logic given that it is already capable of achieving such a good performance.

I think it's quite likely there won't be any limitations in future models that we humans can exploit for contests.

iamfreeezing

The advent and growth of AI makes me immensely sad, but this is the future people are building for themselves. It's depressing to me.

The advent and growth of AI makes me immensely sad

why?

I'm seeing negligence and over-reliance on AI all around me. University and school students in my country have started to heavily rely on LLMs to cheat. There is cheating in coding competition, hiring tests everywhere. I liked when people wrote code and built things by themselves, but no one is doing that anymore.

Psych_x7

Competitive programming is a hobby that we people do for fun and to increase our mental ability to think of solutions. As far as I can guess, with the advent of AI, people who were after the ratings will go away, as ratings might be useless if everyone starts using GPT. On the other hand, CP for increasing your problem-solving ability and beating the problem solver will evolve, and actually competitive programming will gain it's rightful place. CP was and will never be a job-seeking credential that these rating seekers and influencers made it so far.

sudotherapist

this

amsen

← Rev. 5 →

+32

If there is an AI that can solve Div1 AB problems consistently, The whole need for human intelligence is under question and millions of jobs will be replaced, so codeforces should be the least of our concerns. I believe 99 out of 100 tasks are done during a day by a programmer/researcher/scientist/... is easier than a Div1 B.

So the question now has two important sides:

How to have online contests with people who use AI and people who do not use AI: Because AI is not a form of communication between people we should look at it the same as we adopted with internet search during contests. It should allowed and adopted in online contests, and banned in offline contests (or another form of it should be allowed).
We need to filter problems that AI solves under pure easy prompts. It may mean no more Div3/4 contests no more Div2 AB. and also no more classical problems.

I believe that the next ICPC world finals can be a pivot moment for finding current AI capabilities.

The implementation part of the CP never was the hard part and never was the important part, maybe it is OK to give it up (or make it easier by using AI code helpers) at least for online contests.

I think this is somewhat naive in the sense that millions of jobs are already unnecessary :). But that is a long discussion

MaltaDreamer

Switching to offline tournaments is the only option. Chess players had to do it too. )

MOHD_FAIZ

Most of the people are cheating because of company hiring process, as they consider codeforces ratings, so now on companies should ignore the ratings of different platforms, so it may reduce cheating.

wherends

+16

I don’t believe AI can reach a rating of 4000.

AlphaGo’s principle is learning from professional game records; although it defeated humans, its absolute strength did not exceed human capabilities. It was not until AlphaZero that a real breakthrough occurred. AlphaZero discarded human game records and used self-play to achieve positive feedback training, entering a different path from humans. If AI in algorithm competitions learns from human algorithms, I think its upper limit would be 2000, as there aren’t enough problems above that rating to support its training. Unless it can accomplish a kind of positive feedback loop like problem generation -> verification -> problem solving without relying on human data, which seems very challenging. Algorithm competitions involve not only mathematical reasoning but also variables like runtime limits, stack space, and heap space, making it difficult to find a suitable evaluation function.

If it really reaches 4000, most programmers would be replaced.

It doesn't solely learn from human algorithms. From what I've heard about the state of the art, it reinforces the reasoning steps that simply led it to the correct solution. Which may be completely novel.

And from what I heard AI models aren't magical brains that thinks like a real human, but rather a simulated brain implemented with deterministic algorithms that makes it look intelligent. For example, AlphaGo and AlphaZero engine is actually based cleverly pruned decision trees with tons of hardcoded parameters and formulas.

If it can beat PHD experts at measured benchmarks and 90% of competitive programmers it definitely looks intelligent to me. But is it not intelligent also?

I don't know what's your definition of intelligent so I can't give an answer, although I'm just trying to explain its working principle.

I thought you were trying to make that point with your description. I don't see why something would need to be a "magical brain" or be a non-deterministic algorithm to outcompete humans. Plus the way LLMs work and are trained has some similarities with the way brains work.

AlphaGo also used self-play. Starting from zero is not required to achieve well beyond superhuman strength, AlphaGo was also beyond human capabilities. If the training process and hyperparameters are all equal, a network bootstrapped with human play and a network from scratch will eventually converge to roughly the same strength and style of play. The conclusion that because it didn't start from zero it can't get strong is not true.

Years before, I didn't even believe AI could perform at rating 1800, but OpenAI now claims that they did it.

In fact, I think OpenAI's new o1 models are less promissing to reach higher levels like GM. Because I find they struggle to verify the correctness of each step it generates.

Deepmind's latest model for IMO may be another possible pathway. It translates the Math problem to the Lean language and uses Lean to do verification while generating the idea by language models.

abcsumits

the best thing we can do is hide problems and solutions from AI :)

Ashrayy

I liked the concept of virtual and verified rating but, just to mention rating on codeforces is calculated relatively and it is highly unlikely that anyone who does not take programming seriously is gonna appear for such contests.

Even if only serious competitors appear, the ratings will be deflated and verified rating might be really (I mean really) less then the original rating.

Moderation might also be rigged from area to area (pretty sure) and the resources required will be so much.

Блог пользователя di_z