We have scrapped submissions for different users randomly for all problems upto 1457E. We are interested in using this data to work on different benchmark tasks in NLP. code-translation, and code-compilation are two of the tasks we have prepared so far, and we plan to create more tasks. Thus we plan to share our scrapped codes publicly. This dataset will be very beneficial for the machine learning and deep learning community. We want the attention of MikeMirzayanov on this issue of sharing the codes publicly. Is it okay with codeforces to share the dataset (after anonymizing) publicly strictly for research purposes?
Auto comment: topic has been updated by jAckAL_1586 (previous revision, new revision, compare).
Auto comment: topic has been updated by jAckAL_1586 (previous revision, new revision, compare).
Auto comment: topic has been updated by _kryptonyte_ (previous revision, new revision, compare).
I think this is an interesting question. If I send my code to Codeforces, who has rights for it? Can Mike or somebody else decide that it will be published in a data set?
That is defined in the terms of service: https://codeforces.net/terms
How did you find this page?
Most sites have one so I guessed the URL. I actually can't find it linked to by anything.
Depending on local laws, the standard practice is to force you to explicitly click a checkbox saying you agree with the terms of service before you can sign up for an account. So if anyone sues the site they can be like, you can't because you agreed to these rules when you registered. It's very standard stuff for all non-hobby sites on the internet.
My guess is MikeMirzayanov fucked up when he made the user registration flow and forgot to link it. I wonder if those terms are still binding in that case?
Due to international law being terribly difficult to enforce, it doesn't really matter if they're "binding" — he'll just kick you off the site if you don't agree/violate, and you can't do anything to "sue codeforces".
When anyone registers for participation in any codeforces contest, they confirm that they agree with the contest rules. And the current contest rules are saying:
Intellectual property
The rules below are used each time if for a particular round there are no separate rules. Competitors retain ownership of all intellectual and industrial property rights (including moral rights) in and to Submissions.
As a condition of submission, Competitor grants Codeforces, its subsidiaries, agents and partner companies, a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to use, reproduce, adapt, modify, publish, distribute, publicly perform, create a derivative work from, and publicly display the Submission.
Contestants provide submissions on an "as is" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose.
Which looks very similar to the MIT license, except that the rights are granted only to "Codeforces, its subsidiaries, agents and partner companies" rather than to everyone.
There's also an additional potential licensing pitfall: "Any usage of third-party code should not violate the right holder’s license or copyright. Remember that published code is not always free to use! At the request of the right holder, any code that violates the license or copyright may be considered as violating the rules." Some of the submissions may be violating licenses/copyrights of third-parties, but nobody is aware of this (because I doubt that this is strictly enforced).