Maxim's blog

By Maxim, 6 years ago, translation, In English

What does a typical binary search in real values look like? Like this:

float l = 0.0f, r = 1e14f;
for (int i = 0; i < iteration_count && l + eps < r; ++i)
{
    float mid = 0.5f * (l + r);
    if (check(mid))
        r = mid;
    else
        l = mid;
}

Here l and r are the boundaries that depends on the condition, check (x) is a function that returns true starting from some x and false before x and we need to find this x. As a rule, l or 0.5 * (l + r) is the answer.

But there are several disadvantages:

  1. The number of iterations must be choosed balancing between the precision and the execution time.
  2. Answer with an error.

What can be done with this?

First, let's take a look at how real values are represented. You can read in detail, for example, here and here. We use the fact that if we perform the operation Y = real(int(X) + 1), where X is a real value, int(X) takes a real value and returns an integer whose bitwise representation is the same as X (returns the same X interpreted as an integer), real(X) does the opposite — interprets the integer as real, then Y will be equal to the next representable real value after X. Roughly this is similar to Y = X + eps, where eps is the minimum possible and X >= 0 andY = X - eps, if X <= 0 (in both cases, zero is included, because in real values there are 0 and -0). Roughly — because not all X has the same eps. I should note that there is NaN,inf, the maximum positive or negative value, but this is not what I want to talk about. It can be seen that for real L and R, where L < R, the condition int(L) < int(L) + 1 < int(L) + 2 < ... < int(R) is satisfied, similarly as L < L + eps < L + 2 * eps < ... < R. So we can do a binary search with the boundaries int(L), int(R). Now we can do this:

float l_real = 0.0f, r_real = 1e14f;
uint32_t l = reinterpret_cast<uint32_t&>(l_real), r = reinterpret_cast<uint32_t&>(r_real);
while (l < r)
{
    uint32_t mid = l + (r - l) / 2;
    if (check(reinterpret_cast<float&>(mid)))
        r = mid;
    else
        l = mid + 1;
}

The same works for double anduint64_t.int forfloat and long long fordouble can be used. I should note that the standard does not guarantee that float andint are 32 bits each, and double andlong long are 64 bits each but I didn’t see that this was not the case.

What are the advantages?

  1. Maximum possible precision.
  2. A small number of iterations (up to 32 for float and up to 64 fordouble).

What is important to remember?

  • l,r and mid are some non-meaningful values, all that is important is maintaining order in both interpretations.
  • l + r may overflow, therefore l + (r - l) / 2 instead of (l + r) / 2.
  • There is an inconvenience with boundaries with different signs. For negative values we have to convert the value from two's complement representation when converted from real to integer to keep order. You can also make a check in zero, so that the boundaries have the same sign, or add a constant to the values. In my experience problem where boundaries with different signs is a rare case.

It would not be superfluous to take a look on pictures of floating point distribution. For example, I’ll note that in the interval from 0 to std::numeric_limints<float>::max() the median (which is also the first mid in binary search) will be equal to 1.5. That is, from 0 to 1.5 can be represented as many values as from 1.5 to the most representable value. For float in the usual approach, we need 129 iterations only to make the right boundary less than 1.5, if the answer is, for example, 0 (don't forget that usually the initial right boundary is much less). That is, in the usual approach, representable values are discarded extremely badly.

An example of solving a problem using this approach: 45343644

The same approach applies to ternary search (who wants to make a cool example? ^ _ ^).

  • Vote: I like it
  • +110
  • Vote: I do not like it

| Write comment?
»
6 years ago, # |
Rev. 2   Vote: I like it +73 Vote: I do not like it

It's very interesting, but here's the way I see it.

What is better:

for (int whatever = 0; whatever < 200; whatever++)

+ simple

+ easy to understand

+ fast enough 99% of the time

- not fast enough 1% of the time

+ 200 is good enough for any problem

OR:

uint32_t l = reinterpret_cast<uint32_t&>(l_real), r = reinterpret_cast<uint32_t&>(r_real);, blahblahblah

- C++ black magic

- who knows when it will break because of some compiler issue from 2012

+ save a couple iterations

- weird negative edge cases

  • »
    »
    6 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    + optimize the precision

  • »
    »
    6 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Thanks for sharing your point of view. Let me share the way I see it then.

    + simple

    + easy to understand

    Cycle from l to r is also simple and easy to understand. Approach that in the article is also simple and easy to understand imho.

    + 200 is good enough for any problem

    32 is good enough for not less amount of problems. Let me give an example: what if we have binary search in binary search and a heavy check function? Then this approach is ~40 times (200 * 200 / (32 * 32)) faster.

    - C++ black magic

    Why did you mark it as disadvantage? Come to the dark side ;-)

    - who knows when it will break because of some compiler issue from 2012

    I do believe that the probability of that is not more than the break of simple cycle.

    - weird negative edge cases

    Not weird imho.

    • »
      »
      »
      5 years ago, # ^ |
      Rev. 2   Vote: I like it +3 Vote: I do not like it

      oh my god, it really works

      thanks a lot

      10 WA/TL attempts with iterative binary search bring me here

  • »
    »
    6 years ago, # ^ |
    Rev. 3   Vote: I like it +9 Vote: I do not like it

    C++ black magic

    who knows when it will break because of some compiler issue from 2012

    It's not even black magic, it's undefined behavior. It's already broken.

    From https://en.cppreference.com/w/cpp/language/reinterpret_cast:

    double d = 0.1;
    std::int64_t n;
    static_assert(sizeof n == sizeof d);
    // n = *reinterpret_cast<std::int64_t*>(&d); // Undefined behavior
    std::memcpy(&n, &d, sizeof d); // OK
    n = std::bit_cast<std::int64_t>(d); // also OK
    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it +10 Vote: I do not like it

      it's undefined behavior

      It's true because of type aliasing rule. But from the same link:

      Note that many C++ compilers relax this rule

      Also I've mentioned that there is no guarantee that float and int are 32-bit and double and long long are 64-bit. So it's already UB. But it doesn't really matter because it's almost impossible to find the compiler that misbehaves. It's individual choice between hypothetical misbehaving and more precisely/faster solution. Also it's still more probable to fail writing simple cycle as I've mentioned earlier. Actually, there is no compiler that follows the standard at 100%. If there is some discomfort — it is possible to use memset from your example.

      It's already broken

      Strange hyperbolization.

»
6 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it
Ternary Search - Min Function

For finding the max of a unimodal function just reverse the sign.

»
5 years ago, # |
Rev. 2   Vote: I like it -8 Vote: I do not like it

For the regular floating-point search, instead of an iteration count, can you alternatively specify r-l > 1e-7? (Or whatever tolerance you need.)

  • »
    »
    5 years ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    No, it doesn't work well.

    • »
      »
      »
      5 years ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      Can you let me know why? It seems identical in functionality to a fixed iteration count, but easier to understand.

      • »
        »
        »
        »
        5 years ago, # ^ |
          Vote: I like it +13 Vote: I do not like it

        I'm not really sure why, I think it has something to do with floating point precision, but it does not work well. I remember trying to solve a couple problems like that and I was always getting WA no matter what tolerance I had, then I changed to a fixed iteration count and boom ez AC.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    r — l > eps works badly on functions like y = 1000000000000x and you need to find f(x)

    f(r) — f(l) > eps works badly on functions like y = 0.00000000000001x and you need to find x

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I tried to solve this problem during the 2015 ICPC Latin American regionals using r-l>EPS without any success.

    After the contest I tried using iteration count and it was accepted :(

  • »
    »
    5 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    If you are binary searching the answer and will just print $$$(L + R) / 2$$$ after the binary search is done, then you can use something like this:

    while(R - L > eps * max(1.0, L)) { ... }
    

    This checks for a relative or absolute error. It assumes that $$$0 \leq L \leq R$$$ though. A general version is quite tricky and I'm not even sure that I will get it right here:

    #define SMALLEST (L < 0 && 0 < R ? 0.0 : min(abs(L), abs(R)))
    while(R - L > eps * max(1.0, SMALLEST)) { ... }
    

    The idea is to get the minimum possible value of the answer — the minimum value of something in interval $$$[L, R]$$$.

    My advice is: just iterate a constant number of times.

    Bonus
  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I have also seen people do something like this:

    If every test file has a single test case and say, the time limit is around 2 seconds, then you may also write the binary search as:

    while((double)clock() / CLOCKS_PER_SEC < 1.8) {
         //Obtain mid, compute stuff and update lo and hi accordingly
    }
    

    It seems more of a magic to me, but if you can choose the constant in the while loop in a way that your execution time is within the time limit, you will get a very good precision.