No TLE for 10^9 operations solutions??

→ Pay attention

Before contest
Codeforces Round 1006 (Div. 3)
2 days
Register now »

→ Top rated

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

View all →

→ Find user

→ Recent actions

Detailed →

imposter_syndrome's blog

No TLE for 10^9 operations solutions??

By imposter_syndrome, history, 6 years ago, In English

Hi Everyone, i have been seeing this for some days now on codeforces. In yesterday's educational round, I submitted problem D with https://codeforces.net/contest/1107/submission/49013323 solution. Later I realized that it takes minimum 5040*5040*60 operations in worst case. After contest, i tried hacking my solution but it passed successfully. Also I think testcase 20 is having same worst case input (all 0's and last rows last element if say 1), but it passes there in 732 ms. In my local env it takes more than 4s for this to pass.

Also in CF#534, I tried hacking this O(n^2) solution https://codeforces.net/contest/1104/submission/48743647, which was doing more than 10^9 operation on worst case (ababa....aa..bababababa string). But it passed in 640 ms. From my understanding it takes 1s for approx 10^8 operations.

Has codeforces upgraded there testing servers or I am missing something here?

imposter_syndrome
6 years ago
18

Comments (18)

Write comment?

eisen_shinzou

6 years ago, # |

← Rev. 2 →

I don't know about your code( didn't even opened it ) but I used bitset, which divides the operations by 32. And btw in worse case it takes 5200*5200*48( not 60 ).

→ Reply

imposter_syndrome

6 years ago, # ^ |

+16

5040 has 60 divisors. It takes 1524096000 number of operations (+ some other computations) :) While 5200 has 30 divisors and it takes 811200000 operations. Worst case is 5040 from my understanding.

→ Reply

eisen_shinzou

6 years ago, # ^ |

← Rev. 2 →

-70

OK. Anyway, you don't need to post a blog for such things!!!

→ Reply

imposter_syndrome

6 years ago, # ^ |

+26

Ohh. sorry didn't know exact way to ask this. I thought other people may also be seeing this and it would be good to ask if something is missing. Sorry for all the inconveniences it caused to you

→ Reply

eisen_shinzou

6 years ago, # ^ |

-53

I don't care, but as I know they'll downvote, maybe a comment somewhere on tutorial or announcement could be enough...

→ Reply

eisen_shinzou

6 years ago, # ^ |

← Rev. 2 →

-27

Seems they just downvote me... Why always me?? What have I done to you?? OK downvote, I love downvotes!!!

→ Reply

l_Love_Thuy_Linh

6 years ago, # |

Also I think testcase 20 is having same worst case input (all 0's and last rows last element if say 1), but it passes there in 732 ms. You are using fast input, without it : 49042714 (732ms -> 1528ms)
In my local env it takes more than 4s for this to pass. It depend on the power of your computer
From my understanding it takes 1s for approx 10^8 operations. The time limit is 2.5s so i think 10^9 is fine after all ?

→ Reply

ankeshgupta007

6 years ago, # ^ |

+16

Just to add to his answer that 10⁸ is a heuristic bound and must in no way understood as a hard cut. Sometimes, if operation per loop iterations are heavy, 10⁷ might be tricky to squeeze in, whereas 10⁹ may pass if per iteration work is simple.

Also, cache plays a big role where memory accesses are linear, one after the other and can reduce time by factors. I believe in the question above, array accesses were quite linear and operations were just simple comparison.

→ Reply

imposter_syndrome

6 years ago, # ^ |

Yes that is correct that it's only heuristic bound but thing is that most of the time when i see these solutions passing, there is always a better solution which fits into 1s approx = 10^8 operations time limit. I have seen a comment from Tanya_Romanova and many other people about this 10^8 operations thing but not able to find it now. One more ref: https://www.quora.com/How-many-instructions-can-be-executed-in-Codeforces-and-TopCoder-judge-PCs-in-one-second. Also there is one blog on TC https://www.topcoder.com/community/competitive-programming/tutorials/computational-complexity-section-1/.

Also concern is that should these solutions really pass for respective problems? It makes tough to understand what exactly is expected from solution when these solution passes.

→ Reply

LLI_E_P_JI_O_K

6 years ago, # |

← Rev. 2 →

+19

I'm also interested in such question.

What has happened with C++ STL string?

Why solutions where string length is about 300 000 characters and which erase small peace (of 2 characters) in the center of the string while it is not empty passes in 1 second on Codeforces?

Isn't there about 3*1.5*10^10 operations? Or string is not a continuous memory block now, it is like an amazing "list" with indexation of elements?

Examle of 2 different problems and their solutions that I can't hacked and even final tests can't do it too:

1) https://codeforces.net/contest/990/submission/39115847 (passed in 1 second for 300 000 string!)

2) https://codeforces.net/contest/1104/submission/48742260 (passed in 0.078 second for 100 000 string!!!)

Are you kidding me? :))

What has happened?

→ Reply

dmkozyrev

6 years ago, # ^ |

← Rev. 9 →

+25

You forgot to divide number of operations by sizeof(YMM-register) / sizeof(char) const for a string. So, erasing from std::string takes n / sizeof(YMM-register) * sizeof(char) = n / 256 * 8 = n / 32 operations. Estimate is n*(n+1)/2 / 32 = 300000^2 / 64 = 1.4 * 10^9 operations. Most costly operation in this case is loading / storing data to / from registers. It is 1s = 10^9 in aligned by cache line size case and 2s = 10^9 in unaligned case on codeforces with sequential access and cache prefetching.

→ Reply

LLI_E_P_JI_O_K

6 years ago, # ^ |

So, memmove() can process 256 bits in one processor clock cycle?

→ Reply

dmkozyrev

6 years ago, # ^ |

+26

Yes, it is.

As additional, you can see AVX, AVX2 and FMA in my solution for gym problem G. Underpalindromity. With n = 200000 and k = 100000 it is n * k / 2 / sizeof(YMM-register) * sizeof(int) = 200000 * 100000 / 8 = 2.5 * 10^9 operations, I think.

If submission not available for you, this is source code

→ Reply

LLI_E_P_JI_O_K

6 years ago, # ^ |

OMG, problem setters, let our restrictions for string/array length be 1e6 or 1e7 to avoid passing solutions with N^2 and architecture level optimizations.

→ Reply

dmkozyrev

6 years ago, # ^ |

← Rev. 2 →

For example, you can't increase n and k in problem G. Underpalindromity, because there are solutions with correct algorithm in O(n log(n)) with working time 0.7s-1s.

Learning algorithms and data structures — one way, learning architecture level's optimizations — another way. When you are using array instead list and iterate over it, you already using architecture level optimizations. Both knowledge can be combined, for example, you can use sqrt-decomposition with such optimizations and sqrt-list and it will be faster, than segment tree and cartesian tree.

→ Reply

MrDindows

6 years ago, # ^ |

A bit simpler and faster solution: 49057315

→ Reply

dmkozyrev

6 years ago, # ^ |

Fantastic, speed up in 2 times and gcc can vectorize even with addition to int64_t!

If you are interested, there are another problem 101341I - Matrix God "Matrix God" with 1000 x 1000 matrix multiplication by modulo 10^9+7 and fantastic solution 45798437 by Constantine Drozdov in 374 ms

→ Reply

dalex

6 years ago, # ^ |

+30

MikeMirzayanov, consider dmkozyrev to be the tester of CF rounds!

→ Reply