Rogue_Ronin's blog

By Rogue_Ronin, history, 11 hours ago, In English

So DeepSeek launched their R1 model which they claim is on par with OpenAI-O1.

More details here: Link to X post

This is a big update for CP as the model is open sourced and the chat can be accessed for free. As for the R1 model itself, I think its great particularly the chain of thought feels like that of a human.

I tested it out on 6 different problems in total, 3 problems of 1700 rating, 1 problem of 1800, 1 of 1900 and one unrated problem from recent contest (Div. 2 — 996).

This is what happened:

  • Problem 1 — It managed to solve this in the first attempt.
  • Problem 2 — Again managed to solve in the first attempt.
  • Problem 3 — For this problem, it took an extra prompt.
  • Problem 4 — Unable to solve after 5 attempts.

  • Problem 5 — Unable to solve after 4 attempts.
  • Problem 6 — Unable to solve after 4 attempts.

Was curious to know your thoughts on this model, is this something that contests would be affected by in the near future?

  • Vote: I like it
  • +13
  • Vote: I do not like it

»
11 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Rogue_Ronin (previous revision, new revision, compare).

»
8 hours ago, # |
Rev. 3   Vote: I like it +11 Vote: I do not like it

I notice that the problems that it can solve from your data set are a couple of years old, meaning there's a good chance that the model has those problems in it's training data

Testing it on some newer problems (single shot):

  • Photoshoot For Gorillas (o1 solvable): 302277443 | (AC) (1400)

  • Paint a Strip (not o1 solvable): 302278718 | (WA on test 1) (1200): It admitted in its COT That it couldn't figure it out, funnily enough

  • Penchick and Desert Rabbit (idk if o1 can solve it): 302279772 | (WA on test 1) (1700): It was confident in its answer this time looking at the COT

It looks impressive, and it's nice to see the "behind the scenes" of the COT, but it also has the same flaws as GPT o1, and since o1 is already readily available (paid through ChatGPT, free but janky through Github Models), I'm not sure if this will affect the cheating epidemic to a large degree. It's nice that it's more openly available though, and the ease of access could make it easier for problem setters to test their problems on.

An additional note, for Paint a Strip and Penchick and Desert Rabbit, it took 4 — 5 minutes to respond

EDIT: Tested it on 2 problems from the USACO December 2024 contest, could solve It's Mooing Time from Bronze, but not Cake Game from Silver (which was simple enough to be a bronze question)

»
6 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

It outputs much shorter code than o1.

»
5 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

Couldn't solve this (Gave brute force)