I recently came across this problem. There is no official editorial (except for a few short comments in this thread), and I have not been able to find an explanation I understand.
Problem Summary: You are given N (1 <= N <= 24) tennis balls, each located at a distinct lattice point, as well as a tennis basket also located at a lattice point distinct from all other lattice points. At one time, you can hold at most 2 tennis balls. Find the minimum total squared distance it'll take to put all tennis balls in the basket, if you start at the basket. Also:
Every lattice point is from (-100, -100) to (100, 100)
The time limit is an unusually high 4 seconds
A $$$O(2^N * N^2)$$$ solution is obvious: do a subset/bitmask DP over all N tennis balls. To find dp[i], simply test every individual set bit in the number i, as well as every pair of set bits in the number i (representing taking one and two balls in the previous step), and transition from that previous dp-state. There are $$$2^N$$$ states, and there are $$$O(N^2)$$$ pairs of tennis balls for each number, so the overall complexity is $$$O(2^N * N^2)$$$.
But N = 24, so this times out. In the comments, they mention a $$$O(2^N * N)$$$ solution, where for each dp[i] you only need to test N pairs of tennis balls for transitions. Specifically, they say that for dp[i], we only need to test pairs that contain the first set bit in the number i, which reduces the number of pairs, and therefore the number of transitions, to $$$O(N)$$$. Apparently, this approach still manages to cover all possibilities for transitions, but I cannot understand why.
Can someone provide some insights/hints to this problem's solution?
The goal is to see that every partition of a bitmask into bits and pairs of bits can be considered by the dp. You can do that quite easily; sort the parts of the partition by their earliest bit, and that gives you the order in which they were tested and removed from the bitmask. For example, if you had a bitmask with bits 1, 2, 3, 4, 5 and 6 partitioned into (1,4), (2,5), (3) and (6) you can see that the dp does that by first testing the pair (1,4) to get to the bitmask with bits 2,3,5,6, then from that bitmask it tests the pair (2,5) to get to the bitmask with bits 3,6, then you test (3) to get to the bitmask with bit 6, and finally you test (6) to get to the base case of the dp.