Maximum harmonic mean subarray

→ Pay attention

Before contest
Codeforces Round 1002 (Div. 2)
37:41:31
Register now »

*has extra registration

→ Top rated

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	Dominater069	161
4	atcoder_official	160
4	Um_nik	160
6	djm03178	157
7	adamant	153
8	luogu_official	151
9	awoo	149
10	TheScrasse	146

View all →

→ Find user

→ Recent actions

Detailed →

zetaminusone's blog

Maximum harmonic mean subarray

By zetaminusone, history, 4 years ago, In English

Let there be a sequence of integers $$$x_1, x_2, ... x_n$$$. Then, the harmonic mean is defined as

$$$\frac{n}{\sum_{i=1}^n \frac{1}{x_i}}$$$

Given an array of positive and negative integers, what is the most efficient way to find the subarray with the maximum harmonic mean? Thanks!

zetaminusone
4 years ago
7

Comments (7)

Write comment?

clyring

4 years ago, # |

What is the source of this problem? Are there any other constraints?

→ Reply

zetaminusone

4 years ago, # ^ |

This question came as a random thought as I was trying to create more questions for my school's judge. It is purely a matter of interest ^^

→ Reply

steiner

4 years ago, # ^ |

Wouldn't it be just the maximum number in array?

→ Reply

clyring

4 years ago, # ^ |

OK. That explains my weird feeling reading the problem. With no further constraints, there can be division by zero in calculating the harmonic mean of some subarrays due to negative inputs, for example with 1 -1. Even if input guarantees that this cannot happen or it is clarified that this doesn't count, the correct output can be large when the sum in the denominator only approximately cancels out. The correct output for 8 -9 -9 10 is 1440. If there's a particularly efficient way to search for such near-cancellations, it's at least not obvious to me.

If instead all input is positive, then (unless you want the problem to be 'print the maximum of the input' in disguise) some constraint on the size of the subarrays to be considered seems appropriate.

→ Reply

zetaminusone

4 years ago, # ^ |

Thank you! I honestly had not realised at first and you pointed out many insightful observations. That resolves the problem :)

→ Reply

collapse

4 years ago, # |

← Rev. 2 →

There is a stupid systematic way in $$$O(N \log^3 N)$$$. Assume without loss of generality that there is at least one positive element.

Let $$$a_i = \frac 1{x_i}$$$ and $$$p_i$$$ be $$$\sum_{j = 1}^i a_j$$$ (prefix sum array of $$$a_i$$$).

We wish to pick two endpoints $$$l$$$ and $$$r$$$ such that $$$\frac{r - l + 1}{\sum_{i=l}^r \frac 1{x_i}}$$$ is maximized.

$$$ \begin{align} &\phantom{=.}\frac{r - l + 1}{\sum_{i=l}^r \frac 1{x_i}} \\ &= \frac{r - (l - 1)}{\sum_{i=(l - 1) + 1}^r \frac 1{x_i}} \\ \end{align} $$$

Let $$$l_0 = l - 1$$$,

$$$ \begin{align} &= \frac{r - l_0}{\sum_{i=l_0 + 1}^r \frac 1{x_i}} \\ &= \frac{r - l_0}{p_r - p_{l_0}} \\ \end{align} $$$

We wish to maximize this value. We note for later that $$$p_r - p_{l_0}$$$ must be positive. We binary search on $$$k$$$ to find how many $$$0 \leq l_0 \leq r \leq N$$$ satisfy

$$$ \begin{align} &= \frac{r - l_0}{p_r - p_{l_0}} \geq k \\ &= r - l_0 \geq p_rk - p_{l_0}k \\ &= r - p_rk \geq l - p_{l_0}k \\ \end{align} $$$

Let $$$b_i = i - p_ik$$$, then we want to check how many pairs $$$0 \leq l_0 \leq r \leq N$$$ satisfy $$$b_r \geq b_l$$$ AND $$$p_r > p_{l_0}$$$. This is a 3D partial order problem which can be solved using CDQ in $$$O(N \log^2 N)$$$. (Tutorial)

For each iteration of the binary search, if no pairs $$$l_0, r$$$ are found, then $$$k$$$ is too high, otherwise, it's at least that value of $$$k$$$.

Floating point accuracy is likely a big issue. The case where all elements are negative can be handled in a similar way.

→ Reply

clyring

4 years ago, # ^ |

← Rev. 2 →

Binary searching over $$$k$$$ probably requires some care, especially when you seem to claim $$$O(\log n)$$$ iterations as opposed to something like $$$O(\log \frac{1}{\varepsilon})$$$ iterations. The former is certainly achievable, for example by modifying your CDQ queries to provide a random sample and using that $$$(l_0, r)$$$ pair's highest possible value of $$$k$$$ as your next pivot, but isn't entirely obvious.

Floating point accuracy is a very big issue. Unless the bounds are very small, I would prefer to do all arithmetic for this problem in exact arithmetic rather than risk running headfirst toward the limits of what floating-point types the compiler might provide. Obviously this makes arithmetic operations rather expensive, but this seems to be a necessary price to pay. I was already able to craft a testcase that will run afoul of the precision available in a double with $$$n = 8$$$ and $$$| a_i | \leq 1000$$$:

When a = [1, 1, -979, -846, -729, 761, 884, 885], $$$p_8 = 2 + \frac{1}{6656 823 096 297 660} > p_2 = 2$$$, but when computed in doubles in the obvious way on my machine, $$$p_8 = p_2$$$.

Come to think of it, for a fixed $$$l_0$$$, the largest harmonic mean is achieved by minimizing the slope $$$\frac{p_r - p_{l_0}}{r - l_0}$$$ among points $$$r > l_0$$$ with $$$p_r > p_{l_0}$$$, which is a convex-hull query, and can be answered in $$$O((\log n)^2)$$$ arithmetic operations per query by maintaining a segment tree of dynamic (lower) convex hulls and adding points $$$(x, p_x)$$$ in decreasing order of $$$p_x$$$.

→ Reply