Google Hashcode 2021 Practice Round Discussion

→ Обратите внимание

До соревнования
Educational Codeforces Round 173 (Rated for Div. 2)
31:27:45
Зарегистрироваться »

→ Трансляции

aryanc403

До начала 33:37:45

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	151

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя TheOneYouWant

Google Hashcode 2021 Practice Round Discussion

Автор TheOneYouWant, история, 4 года назад, По-английски

Hello everyone,

Since the practice round doesn't seem to provide a scoreboard, I'd like to open this thread to discuss scores and possible approaches for the problem. We got the following scores after some mostly trivial ideas:

A: 65
B: 13,328
C: 702,974,812
D: 7,602,227
E: 10,477,632
Total: 721,068,064

We mostly did some greedies, followed by randomly taking a small subset and computing best answer for that subset. I tried to use max weight bipartite matching but failed to make it work well; I don't have fast codes for max weight general matching, which could have been used to compute "good" pairs of pizzas. Did anyone manage to make this approach work or have a better idea which gave significantly better score?

+134

TheOneYouWant
4 года назад
28

Комментарии (24)

Показать архивные | Написать комментарий?

tob123

4 года назад, # |

← Rev. 2 →

+39

We also did some greedy construction (mostly taking good matching pizzas with many ingredients first) and then used Simulated Annealing to improve it.

A: 49
B: 13,261
C: 714,252,603
D: 7,845,044
E: 10,773,367
Total: 732,884,324

→ Ответить

Priyansh31dec

4 года назад, # ^ |

+17

Can you throw some light on what is Simulated Annealing?

→ Ответить

tob123

4 года назад, # ^ |

+13

Actually, the greedy part is way more important. In the end, we take the most important groups (e.g. the 50 ones having the pizzas with many ingredients) and just switch two random pizzas of that group as a step. But this Annealing is actually not that much better for us than greedily taking steps that improve our current solution.

→ Ответить

7pablof

4 года назад, # ^ |

We also used a greedy algorithm and randomized fine tuning step (simple local search in our case). By considering wasted ingredients in the greedy, we managed to further increase the score on D and E.

A: 74.
B: 13,773.
C: 712,312,932.
D: 8,061,713.
E: 10,988,069.
Total: 731,376,561.

→ Ответить

sharath101

4 года назад, # |

We greedily tried picking the next "best pizza" (with most new ingredients) for each team from team of 4 to 2. We later also tried some optimizations but could increase the score by only a million at max. Randomized greedy solutions didn't seem to give a better score either.

A: 49
B: 11,145
C: 706,523,791
D: 7,832,329
E: 10,758,832
Total: 725,126,146

→ Ответить

Shahraaz

4 года назад, # ^ |

+16

Disclaimer: this is not my work; I just happened to stumble upon it and felt like it is worth sharing

I found this repository by someone named Sagi Shporer , to add some context his team has been participating in hashcode since 2018, you can see all his repositories here

This year his team's practice round Score was 731,455,475 (soure)

A – Example — 74 points (74 before optimization)
B – A little bit of everything — 13,533 points (12922 before optimization)
C – Many ingredients 712,692,751 points (706,624,572 before optimization)
D – Many pizzas — 7,911,296 points (7,863,102 before optimization)
E – Many teams — 10,837,821 points (10,789,627 before optimization)

Algorithm description:

Phase 1: Build a solution
Phase 2: Optimizations

Phase 1 — Build a solution

Sort pizzas by the number of ingredients.
Build deliveries, first teams of 4, after that 3, after that 2:
2.1 Select the pizza with the most ingredients.
2.2 Select the pizza that will give the best improvement in delivery (most new ingredients, with the least overlapping ingredients).
2.3 Repeat 2.2 until the delivery is ready.

Phase 2 — Optimization

Try to swap 2 pizzas between any 2 deliveries — if it improves the score, make the swap.
Try to swap 1 pizza between any 2 deliveries — if it improves the score, make the swap.
Try to swap a pizza from any delivery with unused pizza — if it improves the score, make the swap.
Try to move 1 pizza between 2 deliveries (# of pizza in the 2 deliveries must be -+1) — if it improves the score, make the swap.
If any improvement performed in 1-4 — go to 1

Notes

Phase 1 takes about 5 seconds to run.
Phase 2 takes about 50 minutes to run with the current restrictions (implemented for D & E which are huge). About 1% score improvement.

Full source code

I hope you found this helpful :D

→ Ответить

Modi_bhagwan

4 года назад, # ^ |

How did Phase 1 run so fast for him? Wouldn't the time complexity be quadratic in terms of the number of pizzas (M) ?

→ Ответить

akshitm16

4 года назад, # ^ |

← Rev. 2 →

Breaking as soon as you get a pizza with almost new ingredients runs fast in D and E. You can't afford to traverse the whole array for choosing a pizza every-time in D and E.

→ Ответить

ilovecheapthrills

4 года назад, # ^ |

Can you explain the reason for 4 to 2. Why starting with teams of size 4 is beneficial?

→ Ответить

sharath101

4 года назад, # ^ |

← Rev. 2 →

Since we're greedily picking the pizzas with most ingrediants first, if we start with team of 2, then all of the "good pizzas" will be used up in teams of 2. We dont want that because the score depends on the sum of union of squares of ingredients in each team. And obviously the the distinct ingredients would be more if we use the good pizzas on a larger team size.

We also tried all 6 combinations just to be sure, but 4-3-2 expectedly gave a much better result.

→ Ответить

Priyansh31dec

4 года назад, # |

← Rev. 4 →

A — 74 points — This one can be done by hand
B – 13,750 points — Greedy with some obvious brute force in picking.
C – 706,619,049 points — Picking greedily maximum ingredients pizzas first
D – 7,345,043 points — Same with some randomization
E – 10,369,792 points
Total — 724,347,708 points

Edit: Picked up small subsets and took the best possible combination from the sorted list (decreasing order of pizzas). Tried to swap pizzas between 2 deliveries randomly (swap only if it increases the score)... More the size of subsets better the answer... More the number of iterations of swapping pizzas better the answer.

A — 74 points — This one can be done by hand
B – 13,750 points — Greedy with some obvious brute force in picking.
C – 711,922,487 points — Picking greedily maximum ingredients pizzas first
D – 7,783,549 points — Same with some randomization
E – 10,636,376 points
Total — 730,356,236 points

→ Ответить

Priyansh31dec

4 года назад, # |

Does having some knowledge of Machine Learning help in Google Hash Code?

→ Ответить

akshitm16

4 года назад, # |

← Rev. 3 →

+12

First test set is small. So, you can try everything. For the rest, mostly picking the pizzas with largest ingredients and matching the pizzas such that increase of ingredients is maximum and/or intersection is less. Also, tried partial randomization of the sorted (based on count of ingredients) pizzas array and ran 10-2000 times depending upon the size of test set.

$$$A: 74$$$
$$$B: 12,676$$$
$$$C: 706,624,573$$$
$$$D: 8,061,427$$$
$$$E: 10,988,056$$$
$$$Total: 725,686,806$$$

→ Ответить

codertonk

4 года назад, # |

A – 74 points
B – 13,862 points
C – 712,218,689 points
D – 7,923,609 points
E – 10,831,175 points
Total score — 730,987,409 points

→ Ответить

Errichto

4 года назад, # |

+61

2.5-hour live stream with final score of around 730.7 million — https://youtu.be/BD57-3Zt5r4

Final code: https://github.com/Errichto/youtube/blob/master/hashcode/2021_practice_pro.cpp

→ Ответить

fsshakkhor

4 года назад, # |

+19

Total: 732,415,324 points

A : None

B : 13,439 points

C : 713,419,440 points

D : 8,027,794 points

E : 10,954,651 points

Our Approach:

In our approach we tried to ensure two things mainly.

We tried to make some big deliveries with as many as ingredients possible. The reason is — the score is calculated based on square of the number of unique ingredients. So one big delivery can be far better than a few average deliveries.
Sometimes the pizzas in a delivery can have so many intersecting ingredients. So we substracted some cost for each intersecting elements.

Then we greedily tried to make deliveries starting with pizzas with most ingredients. Also we tried to make the deliveries in 4-3-2 team order which produced the best result. Lastly we tried to replace some of the pizzas with unused ones to check if it produces better result.

→ Ответить

BitSane

4 года назад, # |

← Rev. 2 →

I mostly used some greedy code (picking the best every time), along with some randomization in the bigger files.
- A: 74
- B: 12,015
- C: 705,472,464
- D: 7,748,466
- E: 10,674,444
- Total: 723,907,463

→ Ответить

kinhosz

4 года назад, # |

← Rev. 2 →

A. 74
B. 13,692
C. 703,144,369
D. 7,199,154
E. 9,771,415
Total: 720,128,704

→ Ответить

7pablof

4 года назад, # |

← Rev. 2 →

A: 74.
B: 13,880.
C: 714,128,859.
D: 8,062,943.
E: 10,988,318.
Total: 733,194,074.

Our final approach, as most of the solutions discussed in this blog, consists in a two step process:

Greedily building a solution (based on maximizing score and also minimizing wasted ingredients).
Performing randomized improvements (i.e., simulated annealing).

In our case, improvements achieved by the second setp were negligible on most datasets (<1k), except for dataset C, which jumped from 706.6 to 714.1 million.

We would like to share some highlights on running times for dataset C:

712 million in 10 minutes (also achievable with hill climbing).
712.5 million in 30 minutes.
713.3 million in 2 hours.
714.1 million in 18 hours.

→ Ответить

ilovecheapthrills

4 года назад, # ^ |

Great Score!

On which machine did you run your codes for 18 hours?

→ Ответить

7pablof

4 года назад, # ^ |

Thanks! All the tests were executed my regular laptop (i7 @ 2.5GHz).

→ Ответить

PanicStation

4 года назад, # |

Simulated Annealing without any greedy initialization got me to 719,012,213.

→ Ответить

PanicStation

4 года назад, # ^ |

Pure Simulated Annealing gets 704,123,568 in C. Seems like greedy is the way to go.

→ Ответить

bully....maguire

4 года назад, # |

I don't know if i should be ashamed of myself , but i think just implementing input and output was more difficult than solving any heavy implementation problem on CF.

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 23.12.2024 10:07:15 (l2).

Десктопная версия, переключиться на мобильную.

При поддержке