Guuber's blog

By Guuber, history, 7 months ago, In English

EDIT: I missed a major point from the paper. Go see shiven's comment below!

Paper Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry claims that combining algebraic methods (Wu's method) with synthetic methods (AlphaGeometry) achieves state of the art results in IMO geometry. They used the same dataset of 30 IMO geometry problems which was used in the AlphaGeometry paper. AlphaGeometry solved 25 / 30 of the problems correctly and the "new model" solves 27 / 30. However the new model is just combination of AlphaGeometry and some algebraic bashing so it couldn't have done any worse than AlphaGeometry.

I don't find this result indicative of any new AI capabilities and for example don't feel like AI solving IMO Number Theory problems is any closer than before reading the paper. The fact that some problems which are hard to solve synthetically are relatively easy to bash algebraically isn't really surprising. Nevertheless I do find it interesting that we have an AI model that can solve IMO geometry problems (at least from this set of problems) better than "IMO gold medalist".

  • Vote: I like it
  • +28
  • Vote: I do not like it

»
7 months ago, # |
  Vote: I like it +48 Vote: I do not like it

first they came for our geometry problems...

»
7 months ago, # |
  Vote: I like it -30 Vote: I do not like it

who cares

AI waifus when?

  • »
    »
    7 months ago, # ^ |
      Vote: I like it +33 Vote: I do not like it

    What good are AI waifus if they can't beat me at IMO number theory

    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it -34 Vote: I do not like it

      you are just a blue, intelligence should not be very high on your priorities

      • »
        »
        »
        »
        7 months ago, # ^ |
          Vote: I like it +21 Vote: I do not like it

        :D. That logic doesn't make any sense. If AI is worse than me, I can complain that it's too bad. The complaining makes a lot more sense than if I was red.

        If someone is very poor they have much more reason to say "I hope my future partner isn't even poorer than me!" than a billionaire would

      • »
        »
        »
        »
        7 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        you are an inactive purple :)

»
7 months ago, # |
  Vote: I like it +25 Vote: I do not like it

Imagine not tagging :|

This definitely doesn't mean that AI has 'conquered math', or even different domains of IMO. You're correct that the work speaks for the narrow domain of Geometry in math competitions.

The important highlight is the potential of symbolic methods and their huge gains in efficiency. They manage to solve 21/30 problems, close to the performance of the average silver medalist, under a time limit of 5 minutes on a laptop CPU. Most take under 5 seconds. I invite you to contrast that to the amount of compute it takes to train and infer with LLMs in the pipeline (e.g. AlphaGeometry) and you'll be surprised :)

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    That is very true! I mostly saw people reacting to the 27 / 30 number and maybe wrote the post too much from that context. I should have written the post from a more neutral viewpoint!

    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      (Also didn't realize I could / should check if the authors of the paper were on codeforces...)

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Also, geometry is much more mechanical than algebra, combinatorics, or NT. In my opinion (from my fairly limited MO experience), the order this will go is: geometry -> algebra -> NT -> combinatorics, because combinatorics is the least mechanical out of the four. So, when AI solves combinatorics is when we should really be worried.

    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Indeed, there are several things that make Geometry a good target for mechanical provers. We highlight some of the reasons for this in our paper.

      As for ranking others, although I see where you're coming from, I don't think I can make a confident guess sadly. There may even exist pecularities that make some parts easier. For example, I can see many combinatorics problem which address a small enough finite set to be brute-forceable by computers, without using any 'reasoning' as such. Consider something about counting the number of ways to do X such that property Y is satisfied. I would assume quite a few LLMs would be able to write brute-force code that iterates over all ways to do X and checks Y. Similar to how ChatGPT can write inefficient solutions for CP problems. Sure, the computer didn't solve it in the way you'd expect. But MO problems just weren't designed to handle this amount of compute. Humans need to resort to elegant reasoning to solve those puzzles, but computers just might not need to :) Of course, that is a very specific example that you can't extend to different problems, but it's just meant to point out that the computational ways to address these problems might be different from how humans think about them.

      • »
        »
        »
        »
        7 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Good points. But the thing I was talking about was problems of the form "prove something general about this infinite set of objects (could be anything, sets, numbers, boards, etc)", for which there is no finite case check approach, and you're forced to actually reason your way to the solution.

»
7 months ago, # |
  Vote: I like it +14 Vote: I do not like it

If I can calculate a billion times a second, I can get IMO gold medal also.

  • »
    »
    7 months ago, # ^ |
      Vote: I like it -8 Vote: I do not like it

    You do, just you make poor calculations.

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You can write code to do that, and there's a lot of money waiting for you if that code works for a bunch of problems :D

»
7 months ago, # |
  Vote: I like it +10 Vote: I do not like it

If someone tried to use this to solve IMO 2023/6, how many points would it get?