Hello everyone,
I have been using Binary Search for a while, but it still confuses me as to when I should use <=
and when I should use <
and how I should be updating my low
and high
variables accordingly.
I am not sure if this has already been answered, I was unable to find an answer that solved my doubt regarding how to update low
and high
variables.
It's just something I am struggling to wrap my head around, any help regarding this would be highly appreciated. Cheers!
I'm writing binary search in this way: if you want to find the lowest $$$x$$$ satisfying $$$f(x)$$$:
and vice versa if you want to find the highest $$$x$$$ satisfying $$$g(x)$$$:
Does this always work? or do you modify according to problems as well?
Works almost always (l = m , r = m). You just have to find the best initial values for l and r for this.
The key idea here is keeping invariants, this is precisely what's happening in the code. In the case of finding the smallest $$$x$$$ satisfying $$$f(x)$$$ we initially choose $$$l$$$ and $$$r$$$ such that $$$f(l)$$$ does not hold and $$$f(r)$$$ holds. Throughout the whole binary search, we always keep $$$l$$$ and $$$r$$$ that way. In the case of maximizing $$$x$$$, we keep the invariant that $$$f(l)$$$ holds and $$$f(r)$$$ does not hold (or in the code, $$$g(l)$$$ and $$$g(r)$$$).
The way I think of it is this: at any point, we have $$$l, r$$$ such that $$$f$$$ is true on $$$(l_0, l]$$$ and false on $$$[r, r_0)$$$, and we want to bring one of these "borders" towards the other. With this, the base case $$$(l, r) = (l_0, r_0)$$$ is trivially true. The termination condition says $$$f(l_0, x]$$$ is true and $$$f[x + 1, r_0)$$$ is false, so either $$$x = l_0$$$ or $$$x$$$ is such that the non-empty subarray $$$[l_0 + 1, x]$$$ is true, and either $$$x + 1 = r_0$$$ or $$$x$$$ is such that the non-empty subarray $$$[x + 1, r_0 - 1]$$$ is true. Thus we can simply set $$$l_0$$$ and $$$r_0$$$ to be one before and one after the search array endpoints (or the places where we definitely know that $$$f$$$ is true and false respectively).
To be honest, I haven't seen a binary search problem where the codes don't work. Tell me if you find any!
Hi , is it correct to do that : The first true code : ~~~~~~~~~~~~~~~~~~~ while( r > l) { long long mid=(r+l)/2; if(check(mid)) { r=mid; } else { l=mid + 1 ; } } The last true: ~~~~~~~~~~~~~~~~~ while( r > l) { long long mid=(r + l + 1)/2; if(check(mid)) { l=mid; } else { r=mid — 1 ; } } ~~~~~~~~~~~~~~~~~` ? And Thank you :)
.
Try to formally declare your state, what $$$l$$$ and $$$r$$$ means.
Let's declare some notation:
$$$a[i...j]$$$ = Subarray $$$(a[i], a[i + 1], a[i + 2], ..., a[j])$$$
$$$a[i...]$$$ = Subarray $$$(a[i], a[i + 1], a[i + 2], ...,$$$ till end of array $$$)$$$
$$$a[...j]$$$ = Subarray $$$($$$ From start of array $$$..., a[j - 1], a[j])$$$
$$$a[i...i]$$$ = Subarray having single term $$$(a[i])$$$
Empty Subarray: $$$a[0...-1]$$$ or $$$a[1...0]$$$ or ... (Basically when $$$i > j$$$, it denotes an empty subarray)
Now let's find the logic for the lower bound of an array sorted in increasing order.
Function $$$lowerBound(a, x)$$$: Returns the smallest index having element greater than or equal to $$$x$$$.
The function will look something like this:
Let $$$l$$$ and $$$r$$$ be defined as:
$$$a[...(l - 1)]$$$: All elements are smaller than $$$x$$$
$$$a[l...(r - 1)]$$$: Unchecked portion of array
$$$a[r...]$$$: All elements are greater than or equal to $$$x$$$
First, think about the initial value of $$$l$$$ and $$$r$$$.
Initially, the entire array is unchecked, and the indices range from $$$0$$$ to $$$n - 1$$$.
Comparing it with $$$a[l...(r - 1)]$$$, we get
$$$l = 0$$$
$$$r - 1 = n - 1 => r = n$$$
If you're confused about $$$a[...-1]$$$ and $$$a[n...]$$$, assume
$$$a[-1] = -Infinity$$$
$$$a[n] = +Infinity$$$
(Also mentioned in the CF Edu Section)
Now, let's see how we should update the values $$$l$$$ and $$$r$$$ at every iteration.
$$$m = floor((l + r) / 2)$$$ (We can also take ceil according to a different initial state and updating)
If $$$a[m] >= x$$$, this means $$$m$$$ should be a part of $$$a[r...]$$$
So we'll update $$$r = m$$$
If $$$a[m] < x$$$, this means $$$m$$$ should be a part of $$$a[...(l - 1)]$$$
So we'll update $$$l - 1 = m => l = m + 1$$$
How long the loop should run?
The loop should run till we have some unchecked elements in the array.
The unchecked portion $$$a[l...(r - 1)]$$$ will be non-empty till $$$l <= r - 1$$$, or we can say $$$l < r$$$.
If at any point, I ask you the current best answer you have, what would it be?
At any point, $$$r$$$ is the smallest index we know having a value greater than or equal to $$$x$$$.
When the unchecked part becomes empty, we'll return $$$r$$$. (At any point, $$$r$$$ was the best answer we know, but there might be a better answer in the unchecked portion of the array.)
The final code will be:
This is one of the possible codes on lowerBound. You can have different codes on considering different states.
The same idea can be applied on any binary search problem.
When solving problems using "Binary search on answer", you'll get one of the two situations:
...TTTTTFFFFF...
OR
...FFFFFTTTTT...
You can make a binary search algorithm for the last true or first false in the first case, or the last false or first true in the second case.
Binary search is a search algorithm. As long as you don't lose whatever you're searching for, then you win.
In binary search, the search domain (i.e. the answer you're looking for) is in the
[low, high]
region. You check the answer formid = (low + high)/2
. Ismid
a valid answer? Would you want to discardmid
? If yes for the first question and no for the second, don't lose it. Otherwise, feel free to drop it.Key point is, again, don't lose sight of what you're "searching" for. Only discard what's guaranteed to not be a possible answer, and don't ever let the answer escape the
[low, high]
zone.I prefer checking
low <= high
. If the value ofhigh
is initially good then after doing binary search returnlow
, otherwise returnhigh
The diffrence between
<=
and<
is when using<=
, the binary search range is $$$[L,R]$$$, and when using<
, the range is $$$[L,R)$$$.I perfer using
<=
and use another variableres
to store the possible answer.I wrote a blog a while ago which has a few sections that talk about the ways people implement binary search, in particular, about implementations that use $$$<$$$ and implementations that use $$$\le$$$.