polosatic's blog

By polosatic, history, 2 months ago, In English

Hello everyone. Today I would like to introduce you to a new technique — ternary search on non-unimodal functions. I came up with it during ROI 2024 and it helped me to become a prize winner.

The motivation

There are many examples in CP where we need to maximise or minimise the value of some function $$$f$$$. If the function is unimodal, there is no problem using ternary search. But if it's not, there's nothing to do. Of course, if $$$f$$$ is random, we should check all its values. Fortunately, intuition tells us that if the function is close to unimodal, we can also use ternary search on a large part of its area of definition.

The intuition

Consider the unimodal function $$$0.03x^2$$$ after adding some noise ($$$sin (x)$$$) to it:

Graph

Intuition suggests that if the noise is not strong, we can still use a ternary search away from the global minimum.

In this example, we can put it more formally — the derivative of the function is $$$0.06x + cos(x)$$$, and when $$$|x| > \frac{1}{0.06} \approx 16.67$$$, adding cosine does not change the sign of the derivative.

The first optimisation

So, the first idea is to run the ternary search using not while (r - l > eps) but while (r - l > C) and then bruteforce all values between $$$l$$$ and $$$r$$$ with some precision. In many cases when $$$f$$$ takes an integer argument there will be no precision issue at all.

The second optimisation

It is mentioned in this blog. It tells us a similar idea of splitting all argument values into blocks and applying ternary search on them.

This is the only thing related to the blog that I found. I tried googling and asking people involved in CP, but none of them knew about it before.

Testing

The function from the example is boring, so consider a more interesting function: Weierstrass function

We zoom in and find that the maximum is about $$$1.162791$$$

Spoiler

We will search for the maximum on the interval (-1, 1).

Code

This gives us $$$1.12881$$$. Changing $$$eps$$$ will change this value slightly.

Let's split the arguments into blocks. Since the arguments are real, we will not actually split them explicitly into blocks, we will take the maximum in some range near the argument.

Blocks

It gives $$$1.15616$$$, which is quite good. We can optimise it by taking the maximum of all the values of $$$f$$$ we have ever calculated:

Take maximum

This gives us $$$1.16278$$$, which is very close to the expected $$$1.162791$$$. It seems that we have found a good approach. But let's go further.

The third optimisation

Let's change the constant $$$3$$$ in the code. We will call it $$$C$$$. It is not new to experienced people, often it is good to choose $$$C$$$ equal to $$$2$$$ (binary search by the derivative) or $$$\frac{\sqrt5+1}{2}$$$ (golden ratio). As we are cutting out $$$\frac{1}{C}$$$ part of our interval on each iteration, the probability of the maximum being indise the cut-off part decreases as C increases.

Let's choose $$$C = 100$$$ and run ternary search. As we already know, taking maximum of all function calls is very good optimisation, so I have already added it.

Code

If we run it with $$$C = 3$$$, we get $$$1.13140$$$ (the algo is the same as the classic ternary search, but we take the maximum, so the answer is much better). Now let's now increase $$$C$$$ and watch the answer increases:

We run it with $$$C = 30$$$ and get $$$1.16275$$$. We run it with $$$C = 100$$$ and get ... $$$1.15444$$$

In fact, increasing $$$C$$$ does not guarantee a better answer.

The fourth optimisation

Let's bruteforce all values of $$$C$$$ from 3 to 100 and take the best answer:

for (int C = 3; C < 100; C++) res = max(res, find_maximum(Weierstrass, C));

It gives us $$$1.16279$$$ and works faster than block splitting. To compare them further, we need to change the function, because both methods return values that are very close to the answer.

Let's use $$$a = 0.88$$$ and $$$b = 51$$$ for the Weierstrass function. Note that it is impossible to see the actual maximum of the function in Desmos.

I will compare the above 2 approaches on Codeforces Custom test.

C < 100
C < 200
C < 400

I tried combining these methods together, but is performs worse than both techniques (because I used $$$1000$$$ iterations instead of $$$10000$$$ and bruteforced $$$C < 10$$$ not to run out of time).

Conclusion

On the function I considered, both approaches are good enough. On real contests I only used my approach (the fourth optimisation).

From recent problems, 2035E - Monster, I have used this technique successfully. Here is the implementation for integer arguments: 288346163.

If you know how to find the maximum better (not by simulated annealing), please write.

Full text and comments »

  • Vote: I like it
  • +86
  • Vote: I do not like it

By polosatic, history, 4 months ago, translation, In English

Recenlty I was solving this problem. It was obvious strongly connected components. I wrote them and got TL14: 277474774.

I knew that push_backing to 1e6 vectors is slow due to the dynamic reallocation. I know the only way to handle it: precompute the size of each vector, and reinterpret vector<vector> as a static array. It was awful to implement, but I did it and got TL38: 277476691

Then I tried many things like removing the vector of Euler tour, adding pragmas, but have not passes the test 38. So, there are 2 questions for experienced optimizers:

  1. Is there another, simpler way to speed up vector push_backs, without precomputing each vector's size?

  2. Why does my final solution get TL, and how to speed up it? Constraints up to $$$10^6$$$ are enough to pass in this problem I think.

UPD: #define vector basic_string is a useful optimization. EDIT: do not use this define if you use vector<vector<int>> vec instead of vector<int> vec[N].

UPD2: found this blog. There are some explanations.

Full text and comments »

  • Vote: I like it
  • +46
  • Vote: I do not like it

By polosatic, history, 7 months ago, In English

Recently I was upsolving this problem. My subtask was as follows: given three points: A, B, C, and I need to find the intersection point of 2 lines: AB and OC, where O has (0;0) coordinates.

Code, 90/92 tests passed

At first I wrote this using a formula (find() function) and got WA on 2 tests. Then I wrote it using binary search (findgood() function) and got AC. Then I submitted the solution above with find() and #define ld __float128, and it passed.

Can anybody explain what actually happened? How can formula be less precise than binary search?

Full text and comments »

  • Vote: I like it
  • +16
  • Vote: I do not like it

By polosatic, history, 9 months ago, translation, In English

Physics has many cheating statements, like conservation laws, which can help to "prove" interesting math facts. If you know any "proofs", you can share them. I also know some:

Tetrahedron normals

Statement
Solution

Pythagorean theorem

Statement
Solution

Точка Торричелли

Statement
Solution

Full text and comments »

  • Vote: I like it
  • +15
  • Vote: I do not like it

By polosatic, history, 12 months ago, In English

(Unfortunately, Codeforces does not support emoji, so it is here)

Full text and comments »

  • Vote: I like it
  • +45
  • Vote: I do not like it

By polosatic, history, 20 months ago, translation, In English

Hello, Codeforces. I wrote a prorgam using this theorem for interpolation a function as a polynomial. At first, I will explain how does it work and then show some examples of using it.

Wikipedia does not provide one formula I need, so I will show it

Formula

Code is quite long, but simple in use.

How to use program

At first, you need to set values of constants. N is the number of variables, MAX_DEG is the maximal exponent of a variable over all monomials. In main you need to fill 2 arrays: names contains the names of all variables, max_exp[i] is the maximal exponent of the i-th variable, or the upper_bound of its value.

Define d = (max_exp[0] + 1) * (max_exp[1] + 1) * ... * (max_exp[N - 1] + 1). MAX_PRODUCT should be greater then d. Then you need to write a function f(array<ll, N>), which returns ll or ld. In my code it returns an integer, but its type is ld to avoid ll overflow.

Code

Stress-testing

If you uncomment 2 last rows in main, program will check the polynom it got on random tests. The test generation should depend from f working time, because it can run too long on big tests.

Approximations

Function from exaple and similar functions (with n loops) are a polynomial with rational coefficients (if this is not true, the function does not return an integer). So, if APPROXIMATION = true, all coefficients are approximating to rational numbers with absolute error < EPS with functions normalize and approximate (they are using the same algorithm). This algorithm works in O(numerator + denominator), that seems to be slow, but if the polynomial has a small amount of monomials, it does not take much time.

Stress-testing function check considers a value correct if its absolute or relative error < EPS_CHECK.

How and how long does it work

We consider monomials as arrays of exponents. We hash these arrays. Array PW contains powers of points (from POINTS), which we use for interpolation. If you want to use your points for interpolation, modify POINTS. If you use fractional numbers, replace #define ll long long with #define ll long double. Array DIV is used for fast calculating denominators in the formula.

convert(h) — get indexes (in array POINTS) of coordinates of the point corresponding to the monomial with hash h. convert_points(h) — get coordinates of the point corresponding to the monomial with hash h.

Then we are precalcing values of f in all our points and write them to F_CACHE. After it, we run bfs on monomials. During the transition from one monomial to another we decrease the exponent of one variable by 1. When a monomial is got from set in bfs, we find its coefficient using gen. If it is not zero, we need to modify our polynomial for every monomials we has not considered in bfs yet ("monomial" and "point" have the same concepts because we can get a point from monomial using convert_points(h), if h is a hash of the monomial).

We need to modify the polynomial to make one of the theorem's conditions satisfied: there are no monomials greater than our monomial (greater means that all exponents are more or equal). For every point we has not consider in bfs (they will be in set remaining_points) we add the value of the monomial in this point to SUM[hash_of_point]. Then we will decrease f(point) by SUM[hash_of_point] to artificially remove greater monomials.

Time complexity

  1. The longest part of precalc — calculating F_CACHE — take O(d * O(f)) time
  2. Each of d runs of gen is iterating over O(d) points, denominator calculation takes O(N) time for each point.
  3. For every monomial with non-zero coefficient we calculate its values in O(d) points in O(N) for each point.

We have got O(d * O(f) + d^2 * N + d * O(res)), where O(res) is the time it takes to calculate the polynomial we got as a result.

Trying to optimize

It seems that the recursion takes the most time. We can unroll it in one cycle using stack. It is boring, so I decided to try to unroll it in other way. For every monomial with non-zero coefficient, let`s iterate over all monomials with hash less or equal to our hash. For every monomial we check if it is less than our monomial (all corresponding exponents are less or equal). If it is lower, we add to the coefficient the value of fraction in this point (monomial).

// Instead of ld k = gen();
ld k = 0;
for (int h=0;h<=v;h++)
{
    array<int, N> cur = convert(h);
    bool ok = 1;
    for (int i=0;i<N;i++) if (cur[i] > cur_exp[i]) ok = 0;
    if (ok)
    {
	ll div = 1;
        for (int i=0;i<N;i++) div *= DIV[i][cur[i]][cur_exp[i]];
        k += (ld)(F_CACHE[h] - SUM[h]) / div;
    }
}

Is it faster than gen? New implementation is iterating over all pairs of hashes, so it works in O(d^2 * N), too. Let's estimate the constant. The number of these pairs is d * (d + 1) / 2, so we get constant 1 / 2. Now let's calculate the constant of number of points considered by gen. This number can be calculated with this function:

ld f(array<ll, N> v)
{
	auto [a, b, c, d, e, f, g] = v;
	ld res = 0;
	for (int i=0;i<a;i++)
		for (int j=0;j<b;j++)
			for (int u=0;u<c;u++)
				for (int x=0;x<d;x++)
					for (int y=0;y<e;y++)
						for (int z=0;z<f;z++)
							for (int k=0;k<g;k++)
								res += (i + 1) * (j + 1) * (u + 1) * (x + 1) * (y + 1) * (z + 1) * (k + 1);
	return res;
}

The coefficient with a^2 * b^2 * c^2 * d^2 * e^2 * f^2 is our constant. To find it, I used my program. It is 1 / 128. At all, it is 1 / 2^N for N variables. It means that the optimization can be efficient if N is small.

Conclusion

May be, this program will help someone to find formula for some function. Also it can open brackets, that is necessary if you calculate geometry problems in complex numbers. If you know other ways to use it, I will be happy if you share it.

With N = 1 this program is just a Lagrange interpolation, which can be done faster than O(d^2). Maybe, someone will find a way to speed up it with N > 1.

Full text and comments »

  • Vote: I like it
  • +76
  • Vote: I do not like it

By polosatic, history, 22 months ago, translation, In English

Hello, Codeforces. I submitted two very similar solutions to 1771F - Hossam and Range Minimum Query

TL: 193157567 AC: 193157512

You can see that they are different in one line with map. I have no idea why the first solution gets TL.

Full text and comments »

  • Vote: I like it
  • +8
  • Vote: I do not like it

By polosatic, history, 23 months ago, translation, In English

Hello, Codeforces. Recently I invented a problem. Consider a set of points (x, y) with integer coordinates, 0 <= x < a и 0 <= y < b. We want to know the number of acute triangles with vertices at given points.

Trying to interpolate

We can write function f(a, b) which finds the answer and works in (ab) ^ 3. I suggested that it is a polynomial of two variables with degree <= 6. I tried to interpolate it using this theorem. But my code finds non-zero coefficients with monoms with degrees > 6, so my hypothesis was not confirmed. I also failed with right triangles.

code (for right triangles)

This code finds a coefficient with a ^ (C1 - 1) * b ^ (C2 - 1)

What I would like to know:

  • can this problem be solved faster than O(stupid(a, b))
  • can this problem be solved in O(1)
  • maybe, someone knows problems with difficult to find formulas and this method can help to find them?

UPD: Found formula for the number of right triangles with b = 2: f(a, 2) = 2 * a ^ 2 - 4, a > 1.

UPD2: Many thanks to bronze_coder for finding O(1) solution for constant b: OEIS A189814.

Interpolation should use ai > b ^ 2.

UPD3: Finally wrote a O((ab) ^ 2) solution.

Code

Now I can interpolate using bigger values of a and b.

But it is quite strange for me that the number of right triangles is directly proportional to (ab) ^ 2 instead of (ab) ^ 3. Now I will try to understand why the formula doesn't work with ai <= b ^ 2.

UPD4: Code that finds a formula for f(a, b) for a > b ^ 2 and works in O(b ^ 6):

Spoiler

I used OEIS to find a regularity between the coefficients of P, but i didn't find anything :( except x2 = b * (b - 1), but it was obvious.

UPD5:

Finally found a formula and solution in O(min(a, b) ^ 6)

If a < b, let's swap them. Now we need to solve it in O(b ^ 6). If a <= (b - 1) ^ 2 + 1, we can run solution in O((ab) ^ 3) = O(b ^ 6). Now we need to solve it for a > (b - 1) ^ 2 + 1.

Definitions

Consider all right triangles and divide them into 3 types:

  1. triangles without any vertices on the line x = a - 1

  2. triangles with some vertices on the line x = a - 1 and with two sides which are parallel to the x-axis or y-axis

  3. other triangles

Let C(a) be the number of third type triangles (some function).

The number of second type triangles is (a - 1) * b * (b - 1) / 2 * 4:

Proof

By definition, for every vertex (x, y) of first type triangles, 0 <= x < a - 1 and 0 <= y < b, then the number of the triangles is f(a - 1, b), by definition of f.

Well, f(a, b) = f(a - 1, b) + (a - 1) * b * (b - 1) / 2 * 4 + C(a)

Let's prove that C(a) for every a > (b - 1) ^ 2 + 1 has same values.

Proof

C(a) is a constant. Let c be this constant.

Well, f(a, b) = f(a - 1, b) + (a - 1) * b * (b - 1) / 2 * 4 + c. After some transforms, we get a formula for f(a, b) using f((b - 1) ^ 2 + 1, b).

Transforms
Implementation

Unfortunately, the value of c is different for different values of b and I could not find a regularity between them. Interpolation and OEIS did not help me. There are some things I should do:

  1. Express c through b, after all c depends on b only
  2. Solve the problem for a <= (b - 1) ^ 2 faster
  3. Solve the problem for acute triangles.

It will be hard.

UPD6: We can speed up the solution to O(b^5) using this idea

Full text and comments »

  • Vote: I like it
  • +88
  • Vote: I do not like it