Блог пользователя Guuber

Автор Guuber, история, 4 месяца назад, По-английски

Today Deepmind published a blog post AI achieves silver-medal standard solving International Mathematical Olympiad problems.Deepminds model would have gotten 28 / 42 points from this years IMO (judged by "prominent mathematicians Prof Sir Timothy Gowers, an IMO gold medalist and Fields Medal winner, and Dr Joseph Myers, a two-time IMO gold medalist and Chair of the IMO 2024 Problem Selection Committee."). Some relevant notes from the post:

  • "First, problems were manually translated into formal mathematical language for our systems to understand"
  • "Our systems solved one problem within minutes and took up to three days to solve the others."
  • "AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go."
  • "Before this year’s competition, AlphaGeometry 2 could solve 83% of all historical IMO geometry problems from the past 25 years, compared to the 53% rate achieved by its predecessor." (Note: The AlphaGeometry paper was published in January)

  • "AlphaGeometry 2 solved Problem 4 within 19 seconds after receiving its formalization".

I guess the question is: Will this scale?

Полный текст и комментарии »

  • Проголосовать: нравится
  • +24
  • Проголосовать: не нравится

Автор Guuber, история, 7 месяцев назад, По-английски

EDIT: I missed a major point from the paper. Go see shiven's comment below!

Paper Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry claims that combining algebraic methods (Wu's method) with synthetic methods (AlphaGeometry) achieves state of the art results in IMO geometry. They used the same dataset of 30 IMO geometry problems which was used in the AlphaGeometry paper. AlphaGeometry solved 25 / 30 of the problems correctly and the "new model" solves 27 / 30. However the new model is just combination of AlphaGeometry and some algebraic bashing so it couldn't have done any worse than AlphaGeometry.

I don't find this result indicative of any new AI capabilities and for example don't feel like AI solving IMO Number Theory problems is any closer than before reading the paper. The fact that some problems which are hard to solve synthetically are relatively easy to bash algebraically isn't really surprising. Nevertheless I do find it interesting that we have an AI model that can solve IMO geometry problems (at least from this set of problems) better than "IMO gold medalist".

Полный текст и комментарии »

  • Проголосовать: нравится
  • +28
  • Проголосовать: не нравится

Автор Guuber, история, 10 месяцев назад, По-английски

Yesterday Deepmind published a paper about its new model AlphaGeometry. The model solved 25 out of 30 geometry problems from the IMOs 2000-2022. Previous SOTA solved 10 and average score of gold medalist was 25.9 on the same problems:

The model was trained only on synthetic data and it seems (to me) that more data would result in better results:

A notable thing is that AlphaGeometry uses an language model combined with a rule-bound deduction engine and solves problems with similar approach to humans.

The paper can be read here and the blog post can be found here

Own speculation:

I don't see any trivial reasons why similar strategy couldn't be used to other IMO areas (at least number theory and algebra) but I'm not an expert and haven't read all of the details. Generating a lot of data about for example inequalities or functional equations doesn't sound that much harder than generating data about geometry but again, I might be missing some important insight why good data is easy to generate about geometry. I'm not sure if this has direct implications on Competitive Programming AIs. Verifying proofs can be done automatically but I'm not sure if the same applies to algorithms. Still overall very interesting approach and results.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +61
  • Проголосовать: не нравится

Автор Guuber, история, 3 года назад, По-английски

I have been playing lately with the programming abilities of Codex and one thing that surprised me was how accurately it figured out the time complexities of the codes. I have only tested with about 10-20 competitive programming codes and it has deduced the time complexities correctly in every of them. Most of the codes were pretty simple / standard, so I got curious of how "complicated" codes it could guess the time complexity correctly. My intuition is that it could be hard to interpret the time complexity from recursion with a non-trivial base case but I haven't yet tested this too much.

So my challenge for you is to create program(s) from which Codex can't "guess" the time complexity correctly and posting the code as a comment below.

By guessing I mean that I give the code + a line like "// The time complexity of the algorithm is " and let the codex fill out the rest (usually the time complexity in big O notation)

Полный текст и комментарии »

  • Проголосовать: нравится
  • +7
  • Проголосовать: не нравится