Is unordered_set faster than set? - Codeforces

→ Обратите внимание

До соревнования
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
41:24:53
Зарегистрироваться »

*есть доп. регистрация

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Страны | Города | Организации

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя Arpa

Is unordered_set faster than set?

Автор Arpa, 9 лет назад, По-английски

По-английски

Hi!

I thought that unordered_set is faster than set, because set complexity is logarithmic in size and unordered_set complexity is constant time in average.(source: cplusplus.com)

But today I saw this:

TLE(>2000 MS) with unordered_set: 15494816

Accepted(155 MS) with set: 15494846

Also my other solution(witch I submitted during the contest) with unordered_map hacked :'(

Now my question is : Is unordered_set faster than set?

UPD: I have accepted(15497282).unordered_set time was better than set(15494846) : 93<155.

Only with adding this line:s.max_load_factor(0.25);s.reserve(500);.

UPD2: it seems that it is better to use a power of 2 in reserve(i.e. s.reserve(512)).(Thanks from keyvankhademi)

Теги

set, hashmap, unordered_map, unordered_set

+9

Arpa
9 лет назад
13

Комментарии

Комментарии (12)

Показать архивные | Написать комментарий?

»

9 лет назад, # |

Проголосовать: нравится

-8

Проголосовать: не нравится

The default hash function for integers does not perform very well. I suggest trying to look up other hash function alternatives.

In my experience, both unordered set and map have worked well for strings.

→ Ответить

»

»

9 лет назад, # ^ |

Проголосовать: нравится

+5

Проголосовать: не нравится

Any hash function does not perform well unless it is chosen randomly from a big set.

→ Ответить

»

»

»

9 лет назад, # ^ |

Проголосовать: нравится

+10

Проголосовать: не нравится

there's a difference between functions that work badly when someone want them too and that just work badly.

→ Ответить

»

9 лет назад, # |

Проголосовать: нравится

0

Проголосовать: не нравится

unordered_set is not always faster than set, it's like comparing complexity between quick sort and merge sort.

→ Ответить

»

»

9 лет назад, # ^ |

← Rev. 2 →

Проголосовать: нравится

0

Проголосовать: не нравится

to be fair, quick sort is O(n lg n) in average and O(n^2) in worth case,but marge sort is O(n lg n)in average and worth case.

But unordered_set is O(1) in average and O(n) in worth case, but set is O(lg n) in average and worth case.

quick sort and marge sort average is equal, so it is better to use marge sort.

But because unordered_set average complexity is better than set, it seems that unordered_set must be better.

→ Ответить

»

»

»

9 лет назад, # ^ |

Проголосовать: нравится

+5

Проголосовать: не нравится

adding pos.reserve(500); reduces running time better then tree set for the current tests: http://codeforces.net/contest/620/submission/15496964

i have no idea though why setting to a much higher value in place of 500 again increases time

→ Ответить

»

»

»

9 лет назад, # ^ |

Проголосовать: нравится

+5

Проголосовать: не нравится

Yeah, sorry for the bad intepretation. But my point still stands, unordered_set complexity is unstable so the matter of is unordered_set is faster than set depends on the test case.

→ Ответить

»

»

»

9 лет назад, # ^ |

Проголосовать: нравится

0

Проголосовать: не нравится

it is better to use marge sort.

Umm, no, it's better to use quicksort. Proof — this text from Skienna, Algorithm Design Manual:

"What we can say is that experiments show that where a properly implemented quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler."

Basically quicksort has better cache locality. Most C++ standard library implementations use a combination of quicksort and insertion sort as their implementation for std::sort.

→ Ответить

»

9 лет назад, # |

← Rev. 3 →

Проголосовать: нравится

+8

Проголосовать: не нравится

Well, that depends on the implementation of lib (And specially-crafted data will cause that. Check the UPD behind). Theoretically and most of the times, the hash ones are faster than tree ones.(And that's why hashing is quite popular in actual work.)

And as it turns out on Microsoft .Net Framework implements, the answer is yes.

15474905 the HashSet one with 218ms.

15496423 the SortedSet one with 311ms.

As for their implementation in Microsoft .Net Framework, it's already open-sourced. corefx/src/System.Collections/src/System/Collections/Generic/HashSet.cs corefx/src/System.Collections/src/System/Collections/Generic/SortedSet.cs

As for the GNU C++ inplementations(hash table implementations in <bits/hashtable.h>, and the default hash function for int is just static_cast(__val) ), I'm surprised to find a try-catch block in it(and not so easy to understand how it works).

And thank you for the heads up. I'll think twice before using unordered_set/unordered_map on GNU C++.

UPD:

The problem just reminds me of a famous security problem: Hash Collision DoS(a hot problem during 2011~2012), which, according to some reports, a lot of popular languages ware(maybe even are still) affected, including PHP<=5.3.8 , Perl, Java, ASP.NET, V8 JavaScript Engine,Ruby <= 1.8.7-p356 ,etc. A specially-crafted set of keys could trigger hash function collisions, which can degrade performance of HashMap or Hashtable by changing hash table operations complexity from an expected/average O(1) to the worst case O(n).

Some information here:https://bugzilla.redhat.com/show_bug.cgi?id=750533

(Some discussion shows that some language just adds some limits to the number of parameters to "hide" the problem inside the implements of hash table strategy (Source: http://stackoverflow.com/questions/8672985/how-was-the-hash-collision-issue-in-asp-net-fixed-ms11-100). You may want a try to hack my C# solution. )

→ Ответить

»

9 лет назад, # |

Проголосовать: нравится

+5

Проголосовать: не нравится

Sometimes it is.

I have once got a problem AC by changing ordered_set to unordered_set in C++.

However it depends on the hashing. I always go with ordered set, and if it's tle and optimization, I try the unordered one.

→ Ответить

»

9 лет назад, # |

← Rev. 2 →

Проголосовать: нравится

+11

Проголосовать: не нравится

Looks like someone went through the trouble to generate an extremely bad test case input for hacking 620C - Pearls in a Row. Now the hack is included in the system tests as Test 42. It causes a huge number of collisions in that specific G++ implementation of unordered map.

a submission with unordered_map fails with TL: 15496737
a submission with map gets AC (only one line of code changed!): 15496747
a submission with unordered_map and custom hash function: AC again! 15497036. Bit slower compared to map, anyway.

To the OP: good idea to change the max load factor! Looks like a simple solution compared to writing a custom hash function.

Tweaking the load factor + pre-creating many buckets (with reserve) solves the problem because you only get bad performance when there are many items per bucket. The worst-case is when all items are in a single bucket: then lookup in the hash map is reduced to list lookup.

→ Ответить

»

»

9 лет назад, # ^ |

← Rev. 2 →

Проголосовать: нравится

+14

Проголосовать: не нравится

Yes I'm afraid that was me.

To sum it up, unordered_set is generally much faster than set. Set gurantees O(log n) but unordered_set is O(1) in average. However while set ensures O(log n), unordered_set can take O(n) time per insertion. This is very rare in practice unless the dataset is deliberately chosen so that unordered_set causes large number of collisions, degrading to O(n).

→ Ответить

Codeforces (c) Copyright 2010-2024 Михаил Мирзаянов

Соревнования по программированию 2.0

Время на сервере: 22.11.2024 00:10:08 (j2).

Десктопная версия, переключиться на мобильную.

При поддержке