TheOneYouWant's blog

By TheOneYouWant, history, 4 years ago, In English

Hello everyone,

Since the practice round doesn't seem to provide a scoreboard, I'd like to open this thread to discuss scores and possible approaches for the problem. We got the following scores after some mostly trivial ideas:

  • A: 65
  • B: 13,328
  • C: 702,974,812
  • D: 7,602,227
  • E: 10,477,632
  • Total: 721,068,064

We mostly did some greedies, followed by randomly taking a small subset and computing best answer for that subset. I tried to use max weight bipartite matching but failed to make it work well; I don't have fast codes for max weight general matching, which could have been used to compute "good" pairs of pizzas. Did anyone manage to make this approach work or have a better idea which gave significantly better score?

  • Vote: I like it
  • +134
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
Rev. 2   Vote: I like it +39 Vote: I do not like it

We also did some greedy construction (mostly taking good matching pizzas with many ingredients first) and then used Simulated Annealing to improve it.

  • A: 49
  • B: 13,261
  • C: 714,252,603
  • D: 7,845,044
  • E: 10,773,367
  • Total: 732,884,324
  • »
    »
    4 years ago, # ^ |
      Vote: I like it +17 Vote: I do not like it

    Can you throw some light on what is Simulated Annealing?

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +13 Vote: I do not like it

      Actually, the greedy part is way more important. In the end, we take the most important groups (e.g. the 50 ones having the pizzas with many ingredients) and just switch two random pizzas of that group as a step. But this Annealing is actually not that much better for us than greedily taking steps that improve our current solution.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    We also used a greedy algorithm and randomized fine tuning step (simple local search in our case). By considering wasted ingredients in the greedy, we managed to further increase the score on D and E.

    • A: 74.
    • B: 13,773.
    • C: 712,312,932.
    • D: 8,061,713.
    • E: 10,988,069.
    • Total: 731,376,561.
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

We greedily tried picking the next "best pizza" (with most new ingredients) for each team from team of 4 to 2. We later also tried some optimizations but could increase the score by only a million at max. Randomized greedy solutions didn't seem to give a better score either.

  • A: 49
  • B: 11,145
  • C: 706,523,791
  • D: 7,832,329
  • E: 10,758,832
  • Total: 725,126,146
  • »
    »
    4 years ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    Disclaimer: this is not my work; I just happened to stumble upon it and felt like it is worth sharing

    I found this repository by someone named Sagi Shporer , to add some context his team has been participating in hashcode since 2018, you can see all his repositories here

    This year his team's practice round Score was 731,455,475 (soure)

    • A – Example — 74 points (74 before optimization)
    • B – A little bit of everything — 13,533 points (12922 before optimization)
    • C – Many ingredients 712,692,751 points (706,624,572 before optimization)
    • D – Many pizzas — 7,911,296 points (7,863,102 before optimization)
    • E – Many teams — 10,837,821 points (10,789,627 before optimization)

    Algorithm description:

    • Phase 1: Build a solution
    • Phase 2: Optimizations

    Phase 1 — Build a solution

    1. Sort pizzas by the number of ingredients.
    2. Build deliveries, first teams of 4, after that 3, after that 2:
      2.1 Select the pizza with the most ingredients.
      2.2 Select the pizza that will give the best improvement in delivery (most new ingredients, with the least overlapping ingredients).
      2.3 Repeat 2.2 until the delivery is ready.

    Phase 2 — Optimization

    1. Try to swap 2 pizzas between any 2 deliveries — if it improves the score, make the swap.
    2. Try to swap 1 pizza between any 2 deliveries — if it improves the score, make the swap.
    3. Try to swap a pizza from any delivery with unused pizza — if it improves the score, make the swap.
    4. Try to move 1 pizza between 2 deliveries (# of pizza in the 2 deliveries must be -+1) — if it improves the score, make the swap.
    5. If any improvement performed in 1-4 — go to 1

    Notes

    1. Phase 1 takes about 5 seconds to run.
    2. Phase 2 takes about 50 minutes to run with the current restrictions (implemented for D & E which are huge). About 1% score improvement.

    Full source code

    I hope you found this helpful :D

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      How did Phase 1 run so fast for him? Wouldn't the time complexity be quadratic in terms of the number of pizzas (M) ?

      • »
        »
        »
        »
        4 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        Breaking as soon as you get a pizza with almost new ingredients runs fast in D and E. You can't afford to traverse the whole array for choosing a pizza every-time in D and E.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Can you explain the reason for 4 to 2. Why starting with teams of size 4 is beneficial?

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Since we're greedily picking the pizzas with most ingrediants first, if we start with team of 2, then all of the "good pizzas" will be used up in teams of 2. We dont want that because the score depends on the sum of union of squares of ingredients in each team. And obviously the the distinct ingredients would be more if we use the good pizzas on a larger team size.

      We also tried all 6 combinations just to be sure, but 4-3-2 expectedly gave a much better result.

»
4 years ago, # |
Rev. 4   Vote: I like it 0 Vote: I do not like it
  • A — 74 points — This one can be done by hand
  • B – 13,750 points — Greedy with some obvious brute force in picking.
  • C – 706,619,049 points — Picking greedily maximum ingredients pizzas first
  • D – 7,345,043 points — Same with some randomization
  • E – 10,369,792 points
  • Total — 724,347,708 points

Edit: Picked up small subsets and took the best possible combination from the sorted list (decreasing order of pizzas). Tried to swap pizzas between 2 deliveries randomly (swap only if it increases the score)... More the size of subsets better the answer... More the number of iterations of swapping pizzas better the answer.

  • A — 74 points — This one can be done by hand
  • B – 13,750 points — Greedy with some obvious brute force in picking.
  • C – 711,922,487 points — Picking greedily maximum ingredients pizzas first
  • D – 7,783,549 points — Same with some randomization
  • E – 10,636,376 points
  • Total — 730,356,236 points
»
4 years ago, # |
  Vote: I like it +1 Vote: I do not like it

Does having some knowledge of Machine Learning help in Google Hash Code?

»
4 years ago, # |
Rev. 3   Vote: I like it +12 Vote: I do not like it

First test set is small. So, you can try everything. For the rest, mostly picking the pizzas with largest ingredients and matching the pizzas such that increase of ingredients is maximum and/or intersection is less. Also, tried partial randomization of the sorted (based on count of ingredients) pizzas array and ran 10-2000 times depending upon the size of test set.

  • $$$A: 74$$$
  • $$$B: 12,676$$$
  • $$$C: 706,624,573$$$
  • $$$D: 8,061,427$$$
  • $$$E: 10,988,056$$$
  • $$$Total: 725,686,806$$$
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it
  • A – 74 points
  • B – 13,862 points
  • C – 712,218,689 points
  • D – 7,923,609 points
  • E – 10,831,175 points
  • Total score — 730,987,409 points
»
4 years ago, # |
  Vote: I like it +61 Vote: I do not like it

2.5-hour live stream with final score of around 730.7 million — https://youtu.be/BD57-3Zt5r4

Final code: https://github.com/Errichto/youtube/blob/master/hashcode/2021_practice_pro.cpp

»
4 years ago, # |
  Vote: I like it +19 Vote: I do not like it

Total: 732,415,324 points

A : None

B : 13,439 points

C : 713,419,440 points

D : 8,027,794 points

E : 10,954,651 points

Our Approach:

In our approach we tried to ensure two things mainly.

  1. We tried to make some big deliveries with as many as ingredients possible. The reason is — the score is calculated based on square of the number of unique ingredients. So one big delivery can be far better than a few average deliveries.

  2. Sometimes the pizzas in a delivery can have so many intersecting ingredients. So we substracted some cost for each intersecting elements.

Then we greedily tried to make deliveries starting with pizzas with most ingredients. Also we tried to make the deliveries in 4-3-2 team order which produced the best result. Lastly we tried to replace some of the pizzas with unused ones to check if it produces better result.

»
4 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I mostly used some greedy code (picking the best every time), along with some randomization in the bigger files.
- A: 74
- B: 12,015
- C: 705,472,464
- D: 7,748,466
- E: 10,674,444
- Total: 723,907,463

»
4 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

A. 74
B. 13,692
C. 703,144,369
D. 7,199,154
E. 9,771,415
Total: 720,128,704

»
4 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it
  • A: 74.
  • B: 13,880.
  • C: 714,128,859.
  • D: 8,062,943.
  • E: 10,988,318.
  • Total: 733,194,074.

Our final approach, as most of the solutions discussed in this blog, consists in a two step process:

  1. Greedily building a solution (based on maximizing score and also minimizing wasted ingredients).
  2. Performing randomized improvements (i.e., simulated annealing).

In our case, improvements achieved by the second setp were negligible on most datasets (<1k), except for dataset C, which jumped from 706.6 to 714.1 million.

We would like to share some highlights on running times for dataset C:

  • 712 million in 10 minutes (also achievable with hill climbing).
  • 712.5 million in 30 minutes.
  • 713.3 million in 2 hours.
  • 714.1 million in 18 hours.
  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Great Score!

    On which machine did you run your codes for 18 hours?

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Thanks! All the tests were executed my regular laptop (i7 @ 2.5GHz).

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Simulated Annealing without any greedy initialization got me to 719,012,213.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Pure Simulated Annealing gets 704,123,568 in C. Seems like greedy is the way to go.

»
4 years ago, # |
  Vote: I like it +2 Vote: I do not like it

I don't know if i should be ashamed of myself , but i think just implementing input and output was more difficult than solving any heavy implementation problem on CF.