Hello everyone,
Since the practice round doesn't seem to provide a scoreboard, I'd like to open this thread to discuss scores and possible approaches for the problem. We got the following scores after some mostly trivial ideas:
- A: 65
- B: 13,328
- C: 702,974,812
- D: 7,602,227
- E: 10,477,632
- Total: 721,068,064
We mostly did some greedies, followed by randomly taking a small subset and computing best answer for that subset. I tried to use max weight bipartite matching but failed to make it work well; I don't have fast codes for max weight general matching, which could have been used to compute "good" pairs of pizzas. Did anyone manage to make this approach work or have a better idea which gave significantly better score?
We also did some greedy construction (mostly taking good matching pizzas with many ingredients first) and then used Simulated Annealing to improve it.
Can you throw some light on what is Simulated Annealing?
Actually, the greedy part is way more important. In the end, we take the most important groups (e.g. the 50 ones having the pizzas with many ingredients) and just switch two random pizzas of that group as a step. But this Annealing is actually not that much better for us than greedily taking steps that improve our current solution.
We also used a greedy algorithm and randomized fine tuning step (simple local search in our case). By considering wasted ingredients in the greedy, we managed to further increase the score on D and E.
We greedily tried picking the next "best pizza" (with most new ingredients) for each team from team of 4 to 2. We later also tried some optimizations but could increase the score by only a million at max. Randomized greedy solutions didn't seem to give a better score either.
Disclaimer: this is not my work; I just happened to stumble upon it and felt like it is worth sharing
I found this repository by someone named Sagi Shporer , to add some context his team has been participating in hashcode since 2018, you can see all his repositories here
This year his team's practice round Score was 731,455,475 (soure)
Algorithm description:
Phase 1 — Build a solution
2.1 Select the pizza with the most ingredients.
2.2 Select the pizza that will give the best improvement in delivery (most new ingredients, with the least overlapping ingredients).
2.3 Repeat 2.2 until the delivery is ready.
Phase 2 — Optimization
Notes
Full source code
I hope you found this helpful :D
How did Phase 1 run so fast for him? Wouldn't the time complexity be quadratic in terms of the number of pizzas (M) ?
Breaking as soon as you get a pizza with almost new ingredients runs fast in D and E. You can't afford to traverse the whole array for choosing a pizza every-time in D and E.
Can you explain the reason for 4 to 2. Why starting with teams of size 4 is beneficial?
Since we're greedily picking the pizzas with most ingrediants first, if we start with team of 2, then all of the "good pizzas" will be used up in teams of 2. We dont want that because the score depends on the sum of union of squares of ingredients in each team. And obviously the the distinct ingredients would be more if we use the good pizzas on a larger team size.
We also tried all 6 combinations just to be sure, but 4-3-2 expectedly gave a much better result.
Edit: Picked up small subsets and took the best possible combination from the sorted list (decreasing order of pizzas). Tried to swap pizzas between 2 deliveries randomly (swap only if it increases the score)... More the size of subsets better the answer... More the number of iterations of swapping pizzas better the answer.
Does having some knowledge of Machine Learning help in Google Hash Code?
First test set is small. So, you can try everything. For the rest, mostly picking the pizzas with largest ingredients and matching the pizzas such that increase of ingredients is maximum and/or intersection is less. Also, tried partial randomization of the sorted (based on count of ingredients) pizzas array and ran 10-2000 times depending upon the size of test set.
2.5-hour live stream with final score of around 730.7 million — https://youtu.be/BD57-3Zt5r4
Final code: https://github.com/Errichto/youtube/blob/master/hashcode/2021_practice_pro.cpp
Total: 732,415,324 points
A : None
B : 13,439 points
C : 713,419,440 points
D : 8,027,794 points
E : 10,954,651 points
Our Approach:
In our approach we tried to ensure two things mainly.
We tried to make some big deliveries with as many as ingredients possible. The reason is — the score is calculated based on square of the number of unique ingredients. So one big delivery can be far better than a few average deliveries.
Sometimes the pizzas in a delivery can have so many intersecting ingredients. So we substracted some cost for each intersecting elements.
Then we greedily tried to make deliveries starting with pizzas with most ingredients. Also we tried to make the deliveries in 4-3-2 team order which produced the best result. Lastly we tried to replace some of the pizzas with unused ones to check if it produces better result.
I mostly used some greedy code (picking the best every time), along with some randomization in the bigger files.
- A: 74
- B: 12,015
- C: 705,472,464
- D: 7,748,466
- E: 10,674,444
- Total: 723,907,463
A. 74
B. 13,692
C. 703,144,369
D. 7,199,154
E. 9,771,415
Total: 720,128,704
Our final approach, as most of the solutions discussed in this blog, consists in a two step process:
In our case, improvements achieved by the second setp were negligible on most datasets (<1k), except for dataset C, which jumped from 706.6 to 714.1 million.
We would like to share some highlights on running times for dataset C:
Great Score!
On which machine did you run your codes for 18 hours?
Thanks! All the tests were executed my regular laptop (i7 @ 2.5GHz).
Simulated Annealing without any greedy initialization got me to 719,012,213.
Pure Simulated Annealing gets 704,123,568 in C. Seems like greedy is the way to go.
I don't know if i should be ashamed of myself , but i think just implementing input and output was more difficult than solving any heavy implementation problem on CF.