How LGMs could already cheat using current AI. Or, are these not cheating?

Правка en3, от arvindf232, 2024-09-15 17:40:50

Disclaimer: My natural English ability is questionable. I have passed the blog through chatgpt to correct grammatical mistakes.

I experimented with o1-mini and tried to understand its implications for competitive programming on Codeforces at level 2500 or higher. Obviously, the current AI cannot outright provide you with solutions; however, there are already ways it can help quite a lot.

Brute Force Writing

I believe most competitors regularly use brute-forces. They are kind of an admission of failure to find your mistakes on your own. Previously, brute-forces came with the cost of taking 5 minutes to write. I would not do it unless it is a very hard problem, or I have tried debugging by hand for 10 minutes and seen nothing. If AI usage is assumed, then these 5 minutes would be completely saved.

Five minutes would correspond to approximately a 10-point performance boost on the final problem and increasingly more on earlier problems.

Translations of Languages

This may fall under the Code-Completion tools mentioned in the rules https://codeforces.net/blog/entry/133941, so maybe it is fine.

This is really for the unfortunate people who use Kotlin like me or Python. But if we can use the tools most comfortable for us, with the one downside (speed) completely negated, that doesn't seem particularly fair.

When I am using Kotlin, I need to constantly keep in mind constant factors and risk TLE in some cases. Some time limit issues can be offset by well-optimized prewritten libraries. I personally find this to be worth it (or I would have switched long before), and these are always the trade-offs, making (to me) Python vs. Kotlin vs. C++ a mostly fair and informed decision.

Code Trimming/Rewrite

This may fall under the Code-Completion tools mentioned in the rules https://codeforces.net/blog/entry/133941, so maybe it is fine.

Basically, I ask the AI to "remove all unused functions" or "rewrite the code to be cleaner." This should have been an IDE feature, and IDEs can definitely "remove all uncalled functions." But now, AI can be much smarter and more consistent, such as "refactor this to use X instead," or if you write a simple n^2 code and ask to "optimize this using a segment tree." I don't know where the line is.

Heavy Data Structures

The AI can perform simple (simple relative to level 1900) tasks very, very fast. Current good competitive programmers can handle more complex tasks but at a moderate speed. There is one area where this mismatch is especially strong: implementing well-known data structures with slight augmentations.

There are a few data structures that are extremely flexible, but you almost never use them: red-black trees, treaps, segment tree beats, and link-cut trees. They can support many queries, but previously required (1) a working implementation that matches your coding style, and either (2A) preparing prewritten code of the exact version you need or (2B) going through your code and changing every single related spot, hoping that you don't miss anything.

Previously, option 2B was not very practical for long data structures, and option 2A was possible but limited to only a few common tasks, as there are simply too many situations to prepare for. Instead, we just don't use them, since typically they won't be the intended solution and there is a harder-to-think-of but easier-to-implement way.

With AI, however, it can generate your 100–300-line treap or red-black tree in no time, supporting any of your highly specific requests.

This is particularly relevant for competitors around the 3000 level, but for purple, orange, and red-rated participants, regular segment trees, merge sort trees, and 2D data structures all fall into this category.

Example

In 1976E - Разделяемые перестановки, we need to maintain an array of distinct values that supports two operations: specifying value $$$v$$$ and a new value $$$w$$$, and inserting $$$w$$$ before or after $$$v$$$.

The no-brainer option is a treap, as treaps can famously support any dynamic array with many operations. The second option is to think offline, where the operations correspond to a tree, allowing us to use an idea similar to the Kruskal-reconstruction tree. The third option, which is definitely the best, is to realize this can be handled with a basic linked list implementation. During practice, I ended up with the second option.

Yet, the first option certainly requires the least thinking. I asked ChatGPT for a treap implementation supporting those queries and replaced my linked list. The solution was accepted rather easily 281351200. It was not on the first try, but it just required a quick understanding of the ChatGPT code to realize that it had written some functions with ( O(n) ) complexity and then asking it to correct the implementation.

For reference, this is my exact prompt:

Build a treap in Kotlin that maintains an array that supports the following operations: (1) Given a value in the array, insert a value after it or before it. (2) Output the current array. It can be guaranteed that the array has no duplicate values.

Note that, in particular, the AI didn't tell me I was dumb and could have used a linked list, but instead it enabled me to move forward with a very simple idea.

Additional Example

I don't have another concrete problem in mind, but consider the following tasks that have appeared many times:

  • Have a usual ordered set (i.e., can add and remove elements with values up to $$$10^9$$$.
  • Find the sum of the $$$k$$$-largest elements, or find the sum/max/min of elements with keys within some range $$$[l, r]$$$.

Usually, the best course of action is to (1) process the input offline and perform coordinate compression, (2) do some binary search or segment tree descend to find the two corresponding cuts, and (3) use a normal segment tree to maintain the desired information. This is a multistep process that is complicated but is the simplest using only typical prewritten libraries.

However, this is a trivial task for a custom Red-Black tree. At some point, I decided it was worth learning and implemented for myself a version of a Red-Black tree that can handle the sum variant. It took quite a while (over 4 hours), and I still haven't implemented the max version.

But now, anyone can just get an implementation—well, you should definitely have one before a contest—but my actual concern is generating this during a contest.

Conclusion

The point is that with AI, we can make use of additional data structures to their maximum flexibility, with far less knowledge required. With AI, we only need to know "what is potentially possible," and sometimes their performance in terms of constant factors, and never worry about how painful the implementation is.

I can see there is already some impact to even the high rated people.

Opinions

The rules clearly says passing subproblems is also cheating. I personally think this is definitely considered a subproblem, even if you completely provide the solution ideas, and therefore is cheating. So this was the main point of the blog: LGMs could already cheat -- not by a lot, but by some useful margins. But we should also entertain the idea that what if this is not considered cheating:

The first thought I had was, "This is disgusting; there is no way this should be allowed.".But then I realized I am not qualified to judge and make the verdict, and my current opinion is, "I don't know."

Prewritten libraries are always available, and they always provide serious tactical advantages—the cost is your preparations, and the preparations also require the same skill set, so it never trivializes anything.

This is made complicated by the following obvious logical conclusion: A generated library created before a contest must be legal.

The distinction becomes "pre-generated library" vs. "in-contest generated library." In this case, generating every single variant seems to be a sensible thing to do.

In terms of problem solving, it is absolutely clear that I did not provide every single idea to ChatGPT. I may know that a Red-Black tree can handle a maximum query, but that is not the same as knowing the exact way to maintain those parameters and the exact query functions to achieve this.

Of course, even if the act itself is okay, it may not be possible to separate it from other not-okay uses, such as asking, "What data structures could I use to support these queries?"

Further subdivision of Competitive Programming

I believe currently there are two major rulesets: prewritten code allowed ( all online competitions) and not allowed (all onsite competitions, though possibly with printed materials). For example, if Atcoder ARC and AGC continue to allow all AI tools, then that would be a different playfield compared to CF, and we should get good at yet another style of coding. I don't know what the rules would be in other (smaller) platforms.

Additional Comments

Due to the emergence of AI, we may need to properly distinguish between prewritten library and newly AI generated code. I propose the following simple rule in face of this: All prewritten library must be made public. I don't fully support this but I just add this to get the conversions going.

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en6 Английский arvindf232 2024-09-16 01:30:57 20
en5 Английский arvindf232 2024-09-15 22:45:09 14 Tiny change: ' ended up with the ' -> ' ended up going forward with the '
en4 Английский arvindf232 2024-09-15 17:46:07 18
en3 Английский arvindf232 2024-09-15 17:40:50 1966 Tiny change: 'oncern is implementing this ' -> 'oncern is generating this ' (published)
en2 Английский arvindf232 2024-09-15 17:03:30 1322 Tiny change: 'e elements)\n- Find ' -> 'e elements, with values upto 1e9)\n- Find '
en1 Английский arvindf232 2024-09-15 16:37:55 7424 Initial revision (saved to drafts)