denk's blog

By denk, 3 weeks ago, In English

Hello, Codeforces community!

I am currently working on a project called "Codeforces User Analysis System for Generating Individual Training Recommendations". The goal of this project is to create a tool that recommends tasks to users, helping them improve their skills through solving targeted problems.

As the first step, I decided to collect data using the open Codeforces API. After spending about 6–7 hours gathering and processing the data, I thought it would be a good idea to share the dataset with the community. This way, anyone working on similar projects can save some time.

What is this dataset about?

This dataset includes submissions from ≈15,000 active Codeforces users over the entire history of the platform, up to the end of November 2024. The dataset consists of 17.6 million records, with the following details for each submission:

  • handle: An anonymized and shuffled user nickname (e.g., user{i}).
  • rating_at_submission: User's rating at the time of submission.
  • problem_rating: Problem difficulty rating.
  • id_of_submission_task: Unique problem identifier on Codeforces.
  • verdict: Result of the submission (e.g., OK, WRONG_ANSWER).
  • time: Time of submission (in seconds since the Unix epoch).

Where to download the dataset?

I have uploaded the dataset to Hugging Face:
UsersCodeforcesSubmissionsEnd2024

How can this dataset help?

  1. Save time: No need to spend hours collecting data. It’s already processed and available in a ready-to-use format.
  2. Support AI projects: This dataset can be used to develop training systems, analyze user behavior, study problem difficulties, and more.
  3. Inspire new ideas: Perhaps this dataset will inspire you to start your own projects)

Wishing you productive learning and good luck with your projects! :)

Full text and comments »

  • Vote: I like it
  • +38
  • Vote: I do not like it