dpaleka's blog

By dpaleka, 3 years ago, In English

In this post, we will use the recent OpenAI Codex product based on GPT-3, first introduced in Evaluating Large Language Models Trained on Code (Chen et al., 2021) to solve some Codeforces problems.

We use the round Codeforces Round 739 (Div. 3) because the current version of the Codex model was released before this round. As the model has been trained on the Github data, it is possible that it memorized some solutions for older problems.

Solving 1560A - Dislike of Threes from the statement only

This is a simple problem, and the statement kind of explains what we should implement. Let's just give the plaintext statement to Codex:

Prompt
Code

Demo of Codex in action

127276629

Disclaimers

It produces correct code on the second try, the first C++ submission was WA 2. The problem text has been cleaned a bit, and "end in the digit 3" was changed to "have the last digit 3" -- because for some reason Codex thinks "end in the digit 3" means "contain the digit 3".

Solving 1560B - Who's Opposite? from the problem statement and the editorial

Here we have no hope for the solution straight from the problem statement, as it requires some thinking. However, if you give it the editorial, it can produce working code from the first try:

127226971

No formatting is required: just copy and paste the statement and the editorial from the Codeforces website. No inductive bias in user input was needed at all!

What you can do if this is cool or frightening to you

It is a net good to the world if everyone has access to models such as Codex or GPT-3.

If you are a machine learning researcher, or maybe just a competent person in general, try to get involved in EleutherAI, Hugging Face, or something similar. If you want to work on important things, in my opinion language models are very critical right now.

Bonus: What you can do if you own a competitive programming site

The advice below is my personal opinion.

  1. Try to get a tester with Codex access for future rounds. I predict it could become a part of the cheating issue as soon as 2022.
  2. Think of what the rules should say about using this kind of tools.
  3. Prepare for the effects of competitive programming becoming less relevant to job interviews. If your financing model relies on job interview preparation, you will need to change it in a few years. (Codeforces and Atcoder seem to be in the best shape here.)

FAQ

Can language models solve harder problems?

Language models are a new and poorly understood object. Here is a gross oversimplification of the current conventional wisdom:

GPT-3 is a language model. Its main purpose is translating between different representations of the same data. It can do English to German, it can be used to describe imagess, and with Codex it can translate textual commands to code. Codex can solve the Div3 A from scratch because the solution just implementation of the text, but it needs the editorial to solve Div3 B.

As of 2021, language models are not intended for solving algorithmic puzzles or producing mathematical proofs. It is likely we will have a much more powerful Codeforces solver by say 2026, when the research manages to combine proof search methods with the GPT-3 architecture. If I manage to read enough papers on this, I might write more on my academic blog.

I can't see the demo.

If you can't access imgur, go to this link.

How to get access to Codex?

Have some academic credentials, and fill in the waitlist form at OpenAI's website with an interesting research proposal. Please do not use Codex for bad purposes such as cheating or spam.

How to integrate Codex with vim? I got access but I can only use it in the Playground.

Use vim_codex with your OpenAI keys.

What are the Codex parameters in the demo?

I used the OpenAI API from Python as follows:

API call
  • Vote: I like it
  • +269
  • Vote: I do not like it

| Write comment?
»
3 years ago, # |
  Vote: I like it +43 Vote: I do not like it

Now cheaters will try to use this to solve D2A :)

»
3 years ago, # |
Rev. 2   Vote: I like it +33 Vote: I do not like it

I wonder, if you ask it to solve $$$A + B$$$ but with high bounds, what will the output be?

For example if you give it $$$A, B \leq 10^{18}$$$ will it use long longs or stick with int?

  • »
    »
    3 years ago, # ^ |
    Rev. 3   Vote: I like it +6 Vote: I do not like it

    It is very hard to use with C++, because I don't know where to put the instructions.

    Prompt+code

    Not only it gets the type wrong, but notice the trailing { too.

    Can you think of an equivalent test for Python? I think it may genuinely work better in Python, for now. Or maybe I just haven't figured out how to make it write nice C++ yet.

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it +28 Vote: I do not like it

      Wow, thank you for trying it out! I guess the AI only focuses on the key details and disregards what it thinks as unnecessary such as the bounds here.

      For python maybe try the following?

      Given an integer $$$N$$$, print out $$$10^N$$$. Bounds: $$$N \leq 10^8$$$.

      The idea here is that if the AI is smart enough, it will print out trailing zeroes instead of actually calculating the value, which may lead to memory overflow?

      Edit: Oops, that only uses about 41 MB. Can anyone think of a similar but better test that can memory overflow, but has a simple solution without brute-force?

»
3 years ago, # |
  Vote: I like it +8 Vote: I do not like it

OpenAI Codex can solve unseen Codeforces problems

Spoiler
»
3 years ago, # |
  Vote: I like it +4 Vote: I do not like it

That's amazing! I think there is no need to fear an AI solving significantly harder problems though. I think that making a robot that consistently solves problems at Div 2 D or above is about as hard as making a general intelligence. And if that happens then there are a lot of more important things to think about than cheating on codeforces :P.

Though for Div 3 (and div 2 if there are improvements), clearly the solution is to use only long and hard to comprehend problem statements. Atcoder will need to become more like cf and have random commentary about Baby Ehad's first words sprinkled throughout the problem statement.

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it +9 Vote: I do not like it

    We'll see about this. The capabilities of machine learning (and computers in general) are not a subset of human capabilities, nor do humans and computers find the same things difficult.

    No one has had a good track record of predicting the order in which fields of human activity get automated away.

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it +24 Vote: I do not like it

    Even solving Div2 B seems implausible to me: assuming this model is equivalent to Copilot, it was trained with the goal of remembering a lot of code and then outputting it with minor changes, so that the researchers don't get angry at it for copying, and can make bold claims. This implies that it isn't useful for problems that can't be solved by copying existing code from Github.
    And I believe all current language models have zero understanding of text (which is why they are often tested on text completion, which is also just copying with minor changes), but small changes in the problem statements can lead to large changes in the solution (and recognizing this obviously requires understanding of the statement).
    I think the for some reason Codex thinks "end in the digit 3" means "contain the digit 3" part proves my point: it has seen the code for "contain the digit 3" more often than for "end in the digit 3", and can't even see that these things are different. And this was an implementation problem, which is the closest it gets to "translate statement from English to C++".

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

nice blog! btw, can you show how me how to submit to codeforces using command line like you did in the imgur link? thanks in advance.

»
3 years ago, # |
  Vote: I like it +44 Vote: I do not like it

Imagine one day it beats tourist like Deep Blue beat Garry Kasparov at chess.

»
3 years ago, # |
  Vote: I like it +20 Vote: I do not like it

As of 2021, language models are not intended for solving algorithmic puzzles or producing mathematical proofs. It is likely we will have a much more powerful Codeforces solver by say 2026, when the research manages to combine proof search methods with the GPT-3 architecture.

Can you elaborate on the 2026 prediction?

As far as I can see, the problem-solving ("proof searching") part of AI is THE hard part of CP, for both humans and AI. Everything else is just kinda whatever.

»
4 months ago, # |
  Vote: I like it +20 Vote: I do not like it

This was 3 years and a few days ago. It feels like 10 years, at least.