CF 800-1200
- 1998A - Find K Distinct Points with Fixed Center
- AC in 199 seconds Submission
- 2043B - Digits
- AC in 147 seconds Submission
- 2021B - Maximize Mex
- Initially misread the samples from screenshot (Thought for 410 seconds)
- WA on test 1 in 253 seconds Submission
- It also ignored the instruction given to use map instead of unordered_map
- WA on test 1 in 98 seconds Submission
- TLE on test 2 in 56 secondsSubmission
- WA on test 1 in 308 secondsSubmission
- Originally used unordered_map and set, manually changed it but it still failed
CF 1500-1700
- 1903D1 - Maximum And Queries (easy version)
- WA on test 1 in 256 seconds Submission
- Spent 339 seconds and rewrote the exact same code
- WA on test 1 in 190 seconds Submission
- WA on test 1 in 276 seconds Submission
- 2051E - Best Price
- AC in 267 secondsSubmission
- 2044G1 - Medium Demon Problem (easy version)
- AC in 261 secondsSubmission
- 2027C - Add Zeros
- AC in 327 seconds Submission
- 2008E - Alternating String
- AC in 306 seconds Submission
- 2031D - Penchick and Desert Rabbit
- WA on test 1 in 236 seconds Submission
- WA on test 1 in 230 seconds Submission (after this one, I told it the exact test case it was failing on)
- WA on test 1 in 341 seconds Submission
Overall, R1 performed fairly well, especially for an opensource model that was made as a side project. The pitfalls it did have were similar to o1, though it also had the issue of not following instructions at times. Additionally, for the 800-1200 rated problems, I used screenshots to send the statement, but it misread the screenshot during "Maximize MEX", and I had to switch to copy and pasting the statement into the text box.
With that being said, the model, while very impressive, doesn't seem to show anything we haven't seen before with o1, and I doubt any radical changes will have to be made to reduce cheating, similar to what happened when o1-mini initially released. We've already started to see problemsetters attempt to reduce GPT cheating (see the round 1000 Anti-LLM report), and assuming this becomes more of a trend in the future, the ability to cheat with these tools should hopefully be diminished to a large degree (at least until o3 comes out, but that's another story)
As a quick side note, LLMs are not deterministic, meaning your results might not be the same as mine here (though I'd suspect them to be fairly similar).
Auto comment: topic has been updated by nik_exists (previous revision, new revision, compare).