Offline Range MEX queries in O(log n)

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

Consider the following problem :

Your are given an array $$$a$$$ of size $$$n$$$ and $$$q$$$ queries of form $$$(l_1, r_1), (l_2, r_2), ..., (l_q, r_q)$$$. For each query you have to find the MEX of the array $$$a_l, a_{l+1}, ..., a_r$$$. The queries are offline.

$$$0 \le a_1, a_2, ..., a_n \le n$$$

$$$1 \le l_i \le r_i \le n$$$, for all $$$1 \le i \le q$$$

The approach to the problem is discussed in the comment here.

First let's consider a special case of the queries where $$$r = n$$$. In this case we want to find the smallest integer that does not occur in the subarray $$$a_l, ..., a_r$$$. That is, the smallest integer that does not occur at index $$$l$$$ or after it.

For all $$$0 \le x \le n$$$, let $$$b_x$$$ be the greatest index $$$i$$$ such that $$$a_i = x$$$, and $$$0$$$ if no such $$$a_i$$$ exists.

Now the MEX say $$$m$$$ of subarray $$$a_l, ..., a_n$$$ either does not occur in $$$a$$$, in which case we have $$$b_m = 0$$$, or the last occurence of $$$m$$$ in $$$a$$$ is before $$$l$$$, thus $$$b_m < l$$$. We can say that $$$m$$$ is the minimum of all such numbers, thus $$$m$$$ = $$$min$$$ { $$$i$$$ | $$$b_i < l$$$ }.

Now since the queries are offline, we can sort the queries by $$$r$$$ and process them such that we can ignore the subarray $$$a_{r + 1}, ..., a_n$$$. We iterate from from $$$r = 1, 2, ..., n$$$, and for each $$$r$$$ we process all queries $$$(l_i, r_i)$$$ where $$$r_i = r$$$.

Now how do we efficiently calculate the answer for queries $$$(l, n)$$$ ?

We maintain a minimum segment tree on the array $$$b$$$ and update all values for $$$i = 1, 2, ..., r$$$. Now since there are $$$n$$$ updates but $$$n + 1$$$ leaf nodes in the segment tree, at least one of them is zero. So value of root node is $$$0$$$. Now we try to find the minimum index leaf node with value < $$$l$$$. We start at root node and travel down to this node. At each node we see if the value of it's left child is < $$$l$$$. If it is we go to left child. Otherwise we go to the right child. Either of left or right child will always have value < $$$l$$$. Finally we reach a leaf node, the index of this node(after adjusting for offset) is the mex of array $$$a_l, a_{l + 1}, ..., a_r$$$.

Consider the array $$$a = [6, 1, 0]$$$.

Here $$$b = [3, 2, 0, 0, 0, 0, 1, 0]$$$.

Consider evaluation for $$$l = 2, r = 3$$$. The subarray is $$$[a_2, a_3] = [1, 0]$$$.

We start at root node.

We see that the left child has value < $$$l$$$, so we go to the left child.

Now for left node $$$2 \ge l$$$. So we go to right node.

Now for the left child value < $$$l$$$. So we go to left child.

We have reached a leaf node. The corresponding index is 2 and thus the MEX of the subarray $$$[a_2, a_3]$$$ is 2.

Calculating MEX of each query takes $$$O(log n)$$$ time, total $$$O(q log n)$$$. Updating the tree for $$$b$$$ takes $$$O(n log n)$$$ time in total. Sorting $$$q$$$ queries takes $$$O(q log q)$$$ time.

Overall time complexity — $$$O(n log n + q(log n + log q))$$$

#include<bits/stdc++.h>
using namespace std;
#define ll long long
struct SegmentTree{
    int n;
    vector<int> tree;
    SegmentTree(int sz){
        n = 1;
        while(n < sz){
            n <<= 1;
        }
        tree.assign(2 * n, 0);
    }
    void set(int ind, int val){
        ind += n;
        tree[ind] = val;
        ind >>= 1;
        while(ind > 0){
            tree[ind] = min(tree[2 * ind], tree[2 * ind + 1]);
            ind >>= 1;
        }
    }
    int get(int x){
        // return the first index i, such that s[i] < x
        int node = 1;
        while(node < n){
            int left = (node << 1);
            int right = (node << 1) + 1;
            if(tree[left] < x){
                node = left;
            }else{
                node = right;
            }
        }
        return (node - n);
    }
};
int main(){
    int n;
    cin >> n;
    vector<int> a(n + 1);
    for(int i = 1; i <= n; ++i){
        cin >> a[i];
    }
    int q;
    cin >> q;
    vector<vector<pair<int, int>>> queries(n + 1);
    for(int i = 0; i < q; ++i){
        int l, r;
        cin >> l >> r;
        queries[r].push_back({l, i});
    }

    vector<int> res(q);

    SegmentTree s(n + 1);
    for(int i = 1; i <= n; ++i){
        // set the last occurence of a[i] to i
        s.set(a[i], i);

        for(auto [l, ind] : queries[i]){
            // find the smallest x, such that last occurence of x < l
            res[ind] = s.get(l);
        }
    }
    for(int elem : res){
        cout << elem << '\n';
    }
    cout << '\n';

    return 0;
}

// Author: Sahil Yasar #include <iostream> #include <vector> #include <cstring> using namespace std; #define endl '\n' namespace persistentSegTree{ const int N = 1e5 + 10; const int M = 5e6 + 10; typedef int T; T arr[M]; int L[M], R[M], root[N]; size_t sz; int nodes = 0, cnt = 0; T val; T (*f)(T, T); void build(T a[], int v, int tl, int tr){ if (tl == tr) arr[v] = a[tl]; else{ int tm = (tl + tr) / 2; build(a, L[v] = nodes++, tl, tm); build(a, R[v] = nodes++, tm+1, tr); arr[v] = f(arr[L[v]], arr[R[v]]); } } void PSEGtree(T a[], size_t n, T (*func)(T, T), T x){ sz = n, f = func, val = x; build(a, root[cnt++] = nodes++, 0, sz-1); } void reset(){ nodes = cnt = 0; } T query(int v, int l, int r, int tl, int tr){ if (l > tr || r < tl) return val; if (l <= tl && r >= tr) return arr[v]; int tm = (tl + tr) / 2; return f(query(L[v], l, r, tl, tm), query(R[v], l, r, tm+1, tr)); } T query(int l, int r, int rt){ return query(root[rt], l, r, 0, sz-1); } void update(int cur, int prev, int pos, T x, int tl, int tr){ if (tl == tr) return void(arr[cur] = x); int tm = (tl + tr) / 2; if (pos <= tm){ R[cur] = R[prev], L[cur] = nodes++; update(L[cur], L[prev], pos, x, tl, tm); } else{ L[cur] = L[prev], R[cur] = nodes++; update(R[cur], R[prev], pos, x, tm+1, tr); } arr[cur] = f(arr[L[cur]], arr[R[cur]]); } int update(int pos, T x, int rt = -1){ if (rt < 0) rt = cnt-1; update(root[cnt] = nodes++, root[rt], pos, x, 0, sz-1); return cnt++; } }; using namespace persistentSegTree; const int MAX = 1e5 + 10; vector<int> g[MAX]; int in[MAX], out[MAX], timer; vector<int> node; void dfs(int s, vector<int>& nums, int p = -1){ node.push_back(nums[s]-1); in[s] = timer++; for (int& c: g[s]) if (c != p) dfs(c, nums, s); out[s] = timer-1; } T rangeMEX(int v, int x, int tl, int tr){ if (tl == tr) return tl; int tm = (tl + tr) / 2; if (arr[L[v]] < x) return rangeMEX(L[v], x, tl, tm); return rangeMEX(R[v], x, tm+1, tr); } int rangeMEX(int rt[], int l, int r){ return rangeMEX(root[rt[r]], l, 0, sz-1); } int main(){ cin.tie(0)->sync_with_stdio(0); cin.exceptions(cin.failbit); int n, i; cin>>n; vector<int> parents(n); vector<int> nums(n); for (i = 0; i < n; ++i) cin>>parents[i]; for (i = 0; i < n; ++i) cin>>nums[i]; // 0 is root for (int i = 1; i < parents.size(); ++i){ g[i].push_back(parents[i]); g[parents[i]].push_back(i); } int temp[MAX]; memset(temp, -1, sizeof(temp)); PSEGtree(temp, MAX, [](int a, int b){ return min(a, b); }, 1e9); timer = 0; dfs(0, nums); int rt[n]; for (i = 0; i < n; ++i) rt[i] = update(node[i], i); vector<int> ans; for (i = 0; i < n; ++i) ans.push_back(rangeMEX(rt, in[i], out[i]) + 1); for (i = 0; i < n; ++i) cout<<ans[i]<<" "; cout<<endl; return 0; }

Comments (9)

Write comment?

one_autum_leaf

18 months ago, # |

Auto comment: topic has been updated by one_autum_leaf (previous revision, new revision, compare).

→ Reply

igzou

6 months ago, # |

Practice question

https://leetcode.com/problems/smallest-missing-genetic-value-in-each-subtree/description/

ClosetNarcissist

2 weeks ago, # ^ |

← Rev. 2 →

Was able to solve this using persistent segment tree to do range MEX queries online instead of offline. https://leetcode.com/problems/smallest-missing-genetic-value-in-each-subtree/solutions/6126019/online-range-mex-with-persistent-segment-tree

C++ code

hydra_cody

5 months ago, # |

How to solve this

I am thinking of making dfs tree array and work on this array. But how do we solve range MEX queries with update on array?

Hmzaawy

5 months ago, # ^ |

Did you think of square root decomposition?

How will you combine results from two square roots? We can't work with frequency for each square root.

lemelisk

But how do we solve range MEX queries with update on array?

Can be done easily in $$$\mathcal{O}(n^{5/3}+q\sqrt{n})$$$ via 3D Mo and a bit harder in $$$\mathcal{O}(n \log{n}+q\log^2{n})$$$. For the second approach, if the value of $$$x$$$ is at positions $$$p_1, p_2 \dots p_k$$$, consider segments $$$[1; p_1 - 1]$$$, $$$[p_1+1, p_2-1]$$$ ... $$$[p_{k-1}+1;p_k - 1]$$$, $$$[p_k+1, n]$$$ with the value of $$$x$$$ associated with these segments. There will be a total of $$$\mathcal{O}(n)$$$ such segments, and each update changes only $$$\mathcal{O}(1)$$$ of them. And for a range MEX query, we need to find the segment with the smallest $$$x$$$ for all the segments in which the query segment is nested. This can be done for $$$\mathcal{O}(\log^2{n})$$$ in various ways using data structures.

abd-alrzaq

practice https://codeforces.net/gym/105242/problem/A

saturina

Why is this blog flop?

one_autum_leaf's blog