How can we optimize this Subset DP problem from O(2^N * N^2) to O(2^N * N)?

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	167
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

I recently came across this problem. There is no official editorial (except for a few short comments in this thread), and I have not been able to find an explanation I understand.

Problem Summary: You are given N (1 <= N <= 24) tennis balls, each located at a distinct lattice point, as well as a tennis basket also located at a lattice point distinct from all other lattice points. At one time, you can hold at most 2 tennis balls. Find the minimum total squared distance it'll take to put all tennis balls in the basket, if you start at the basket. Also:

Every lattice point is from (-100, -100) to (100, 100)
The time limit is an unusually high 4 seconds

A $$$O(2^N * N^2)$$$ solution is obvious: do a subset/bitmask DP over all N tennis balls. To find dp[i], simply test every individual set bit in the number i, as well as every pair of set bits in the number i (representing taking one and two balls in the previous step), and transition from that previous dp-state. There are $$$2^N$$$ states, and there are $$$O(N^2)$$$ pairs of tennis balls for each number, so the overall complexity is $$$O(2^N * N^2)$$$.

But N = 24, so this times out. In the comments, they mention a $$$O(2^N * N)$$$ solution, where for each dp[i] you only need to test N pairs of tennis balls for transitions. Specifically, they say that for dp[i], we only need to test pairs that contain the first set bit in the number i, which reduces the number of pairs, and therefore the number of transitions, to $$$O(N)$$$. Apparently, this approach still manages to cover all possibilities for transitions, but I cannot understand why.

Can someone provide some insights/hints to this problem's solution?