Блог пользователя ssvb

Автор ssvb, 3 года назад, По-английски

Right after the end of Facebook Hacker Cup 2021 Round 1, wjomlex posted the following comment in the discussion section under the announcement blog:

A common issue was people writing O(N^2) solutions to problem A when O(N) solutions are necessary. You won't be able to run an O(N^2) solution in 6 minutes when N = 800,000.

But is this really true? Challenge accepted!

Problem A1
Spoiler
Problem A2, solution v0

Reusing code from the problem A1 solution allows to easily create an $$$O(N^3)$$$ solution for problem A2 by just iterating over all the possible substrings, calling function "solve_a1" for each of them and accumulating the result. But we need $$$O(N^2)$$$ to satisfy the conditions of the challenge and this can be achieved by applying a minor adjustment, transforming "solve_a1" into "solve_a2l":

Solution v0 for problem A2

Unsurprisingly, this solution is way too slow and needs 28 minutes to process the practice mode data even on my desktop computer with a quad-core Intel Core i7-860 processor @2.8GHz:

Spoiler
Problem A2, solution v1 (OpenMP pragma)

A good thing is that the performance can be very easily improved by just taking advantage of more CPU cores:

Solution v1 for problem A2 (use OpenMP pragma)

Just a single line #pragma omp parallel for before a single loop helps a lot. The reduction(+:ans) part is also needed to tell OpenMP how to combine results from each loop iteration (without it we would need to store these results in a temporary buffer and then add them together in an extra $$$O(N)$$$ loop). The accumulator variable is upgraded to int64_t to avoid overflows. Additionally, the schedule(guided) part of the pragma and the reversed order of processing is there to keep all CPU cores busy and improve performance. The compiler command line needs to include -fopenmp option.

Below are benchmarks on my desktop computer, old laptop and Samsung Galaxy Tab S6 Lite tablet:

Spoiler
SolutionDeviceProcessorTime
solution v0desktop computerquad-core Intel Core i7-860 @2.8GHz28m13.998s
solution v1 (OpenMP)desktop computerquad-core Intel Core i7-860 @2.8GHz6m31.028s
old laptopdual-core Intel Core2 Duo T7200 @2.0GHz25m39.187s
android tabletocta-core 4x ARM Cortex-A73 @2.3GHz + 4x ARM Cortex-A53 @1.7GHz8m34.429s
Running it all on a homemade "cluster"

I have multiple devices at home and none of them is fast enough by itself. But what if they all work together? It isn't very difficult to make a script, which dissects the input data into separate testcases, runs them on different devices and combines their output together:

Ruby script
Results

Now the time is down to 4m45.626s and this is within the 6 minutes limit. Assigning jobs to different devices could be surely improved. Right now I'm just sorting them by the lenghts of strings and processing more difficult cases first. The old laptop was totally useless and couldn't handle even a single big testcase #17. The other devices came to the rescue in the end and finished it faster.

TL;DR;

Facebook Hacker Cup has an unusual format. The contestants have 6 minutes to run their solutions on their own computers. This provides a lot of opportunities to speed up even poor time complexity solutions by taking advantage of multiple CPU cores and multiple computers. OpenMP is easy to use.

To be continued

This is not the end, but the blog is already becoming too large and it's necessary to wrap it up. The next part will feature Clang vs. GCC, different compiler optimization options and taking advantage of SSE2/AVX2.

Spoiler

Полный текст и комментарии »

  • Проголосовать: нравится
  • +203
  • Проголосовать: не нравится

Автор ssvb, 3 года назад, По-английски

It's been known since a very long time ago that partially colorblind people find it challenging to distinguish colors of cyan and gray user handles on the Codeforces website. This shade of cyan just happens to look somewhat pale and very muddy, which makes it only slightly different from gray. Looking at these colors and trying to see the difference feels uncomfortable and exhausting (this kind of experience is similar to trying to read a blurry text).

At least for me, cyan needs to be a lot more intensive in order to be reliably registered as cyan. As suggested in an old comment, a somewhat usable workaround is to add .user-cyan { color:#08e8de!important; } line to the userContent.css configuration file for the Firefox browser. After this change, user handles will look like cyan and gray. This particular shade of cyan color is also used on the AtCoder website, so probably it is not too repulsive for people with normal color vision either. And gives a hope that maybe the Codeforces default cyan color can be changed one day to become more colorblind-friendly.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +174
  • Проголосовать: не нравится

Автор ssvb, история, 4 года назад, По-английски

DMD compiler is expected to support import std; directive starting from version v2.086.0, as documented in its changelog. This may be very useful for competitive programming because just a single line of code can provide access to the whole standard library. Similar to how just a single line #include <bits/stdc++.h> is often used in C++ solutions instead of multiple include directives.

But something seems to be wrong with the D compiler used by the codeforces platform, as can be checked via https://codeforces.net/problemset/customtest

This code gets stuck:

import std;
void main() { printf("hello\n"); }

OUTPUT:

Replacing import std; with import std.stdio; helps:

import std.stdio;
void main() { printf("hello\n"); }

OUTPUT:

MikeMirzayanov, could you please check what's wrong? Thanks!

PS. I would also appreciate if my solution 114610491 submitted during Educational Codeforces Round 108 (рейтинговый для Див. 2) could be re-judged after fixing the compiler. But that's not super important in the grand scheme of things.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +8
  • Проголосовать: не нравится