MEX Data structure - Codeforces

→ Обратите внимание

До соревнования
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
30:17:40
Зарегистрироваться »

*есть доп. регистрация

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя kaldiuo

MEX Data structure

Автор kaldiuo, история, 7 лет назад, По-английски

There are q<=5*10^5 online queries of 2 types for a set of integers:

Add an integer 1<=x<=10^9 to the set
Remove an integer from the set
Find the minimum element >=x that isn't present in the set

Is there a way to efficiently process these queries?

data structures

kaldiuo
7 лет назад
33

Комментарии (32)

Показать архивные | Написать комментарий?

markysha

7 лет назад, # |

+41

If you can answer in offline, then just compress X in all queries and use simple segment tree with sum on a range of fenwick tree. If you must answer in online then use implicit segment tree.

→ Ответить

2ndSequence

4 года назад, # ^ |

-30

Can you kindly provide more information?

And would your idea work with answering interval query?

in other words, suppose you have array of N items and Q queries and you want to find the MEX in each [L, R].

n, a[i] <= 1e5.

→ Ответить

-is-this-fft-

4 года назад, # ^ |

+64

Come on, think before asking. I get that the comment above was terse but at least this should be clear:

If you can find the smallest element that is not in the set and is at least $$$L$$$, then of fucking course you can find the smallest element that is not in the set and is at least $$$L$$$ and at most $$$R$$$. Either the answer is the same or it doesn't exist in the second case.

→ Ответить

dorijanlendvaj

4 года назад, # ^ |

+28

Huh, I thought he was asking about a querying the MEX of a subarray, not about the problem from the blog but with ">=x" replaced by ">=x and <=y".

→ Ответить

-is-this-fft-

4 года назад, # ^ |

Oh shit, okay. In that case I'll be annoyed by asking not really relevant questions instead of not thinking.

→ Ответить

rachitiitr

7 лет назад, # |

← Rev. 2 →

+16

In your previous version, you did not mention that you can remove elements.
So my approach will work when you just add elements and query.

When I am adding x to set, I will:
1. Unite the clusters of x, x-1 if x-1 is present in set
2. Unite the clusters of x,x+1 if x+1 is present in set

When doing the unite, maintain the global max of that cluster.
The ans for 2nd query will be then the global max of its cluster + 1.

→ Ответить

peltorator

7 лет назад, # |

+48

You have 5 * 10^5 queries, so answer is less than 5 * 10^5. So you can make a set2, where you keep integers that are not in your set. Answer will be set2.min

→ Ответить

kaldiuo

7 лет назад, # ^ |

+13

Sorry if the title confused you, query 3 wants the mex >= x, where x can be anything.

→ Ответить

spj_29

7 лет назад, # ^ |

← Rev. 2 →

a simple lower_bound on the set2 will do that...wont it?

EDIT: kinda messed up here :P....x can be large

→ Ответить

kaldiuo

7 лет назад, # ^ |

← Rev. 2 →

The added elements can range from 1 to 10^9, so unless for query 3 x is always 0, set2 has to start with all integers from 1 to 10^9.

→ Ответить

spj_29

7 лет назад, # ^ |

ah right...i messed up there...just use a dynamic segment tree then..

→ Ответить

SaYami

7 лет назад, # ^ |

+15

You only need to consider all numbers x and x+1 such that x is a number from the queries. So, it will work.

→ Ответить

peltorator

7 лет назад, # ^ |

← Rev. 2 →

Oh, I missed it, sorry) So you can do smth like this with dynamic segment tree Or if your problem is offline, you can make "compressison of coordinates" and use my trick with lower_bound, If problem is online you also can do it, in this case you need to keep set of segments(if you added 1,5 and 7 to your set, your set 2 will contain (2,4), (6, 6) and (8, 10^9). Now you can do lower_bound again)

→ Ответить

spj_29

7 лет назад, # |

← Rev. 2 →

+16

You can also maintain a dynamic segment tree on the values (x)..store frequencies of 'x' maybe even count of numbers (store 1 if frequency of 'x'>=1)...this is probably an overkill for your problem but will allow even queries like kth element >=x which isn't present in the set (do a binary search integrated in the segtree...still log(x) ). So yeah might give you little more flexibility. Might not be super good memory wise though...

→ Ответить

sheaf

7 лет назад, # |

← Rev. 9 →

-8

Use Treap that accumulates following data:

Count of nodes in a subtree including current vertex (cnt(this) = cnt(left) + cnt(right) + 1);
Maximum and minimum of subtree.
Largest contiguous prefix (i.e. the one with elements of form x, x + 1, x + 2...x + N).

Maintaining such a prefix isn't that challenging. If prefix of left subtree equals to size of left subtree AND maximum of left subtree + 1 equals to key of current node then the prefix of current node is prefix(left) + 1 else it's just prefix(left). If prefix(left) + 1 = prefix(this) then you should inspect the right subtree: if min(right) - 1 = key(this) then prefix(this) = prefix(left) + 1 + prefix(right).

Now to answer queries:

Given x you should split you treap by key x. Now, right tree after split contains all integers greater than x. If min(split_right) - 1 > x then the answer is just x + 1, otherwise you should split split_right by count(not key), where count = prefix(split_right). The answer will be max(split - by - count_left) + 1.

Time complexity per query(any) : O(log(Q)), Space complexity: O(Q), Q = 5 * 10⁵.

Nonetheless, Treap is not the fastest data structure so do not expect this solution to be blazing fast.

→ Ответить

dalex

7 лет назад, # |

-14

The fastest way is Fenwick tree with length 5*10^5 (as the answer can't exceed 5*10^5).

→ Ответить

kaldiuo

7 лет назад, # ^ |

+20

Sorry if the title confused you, query 3 wants the mex >= x, where x can be anything.

→ Ответить

matthew69

7 лет назад, # |

Can you share the link of the problem from any online judge?

→ Ответить

kobortor

7 лет назад, # |

-9

I think the best solution would be to have a BBST (Treap, Splay...) where each node contains the closest node with a count of 0.

When you insert a value at x, make sure to add a dummy node with count 0 at (x+1).

When you are removing, simply subtract the counts by 1.

When you are querying, first check if the query element exists. If it does not exist, then the answer is simply that, otherwise use the nodes in the tree to binary search for it.

→ Ответить

Yellow.Flash

7 лет назад, # |

← Rev. 3 →

-17

let's say we have an ordered-set (but here we can keep frequencies in map).
Now insertion and deletion into o-set is straightforward.
For finding smallest y such that y>=x and y is not present in the o-set we can binary search on y in the range [x,infinity].
1 : it will work in O(logn)
2 : it will work in O(logn)
3 : it will work in O(logn*log(range))
Please point out if you find any mistake or flaw in the logic.

→ Ответить

misof

7 лет назад, # |

+44

Store your current collection as a union of intervals. For example, if your actual set is S = {1,2,3,7,10,11}, store it as [1,4), [7,8), [10,12). You can easily use a standard C++ set for this.

Operation 1: Add a one-element interval [x,x+1). Check whether you need to merge it with the previous and/or next interval.

Operation 2: Using set::upper_bound, find the interval that contains your x. Remove it from the set, split it into two smaller intervals and reinsert them if they are not empty.

Operation 3: Using set::upper_bound, find whether x is in your set. If it isn't, return x. If it is, return the upper bound of the interval that contains x.

Each operation runs in O(log n) and you don't need to implement any trees.

→ Ответить

f2lk6wf90d

7 лет назад, # |

← Rev. 2 →

My idea:
Create a set of intervals S. If the interval [l, r] is present in S, that means that the elements l, l + 1, l + 2, ..., r are in S. Adding an element x is handled like this: If [whatever, x - 1] is present in the set, remove it and add [whatever, x]. Likewise, if [x + 1, whatever] is in the set, remove it and add [x, whatever]. Removals are handled in this way: Find the interval containing x. Split it in two intervals [start, x - 1], and [x + 1, end]. MEX queries are handled similarly: Find the interval containing x. Output end + 1. This can be implemented using 2 std::set<std::pair<int, int>> in C++. The first set indexes pairs by the first element (i.e. left end), and the second one by their second element (i.e. right end).

Code

class MEX {
  typedef std::pair<int, int> interval;
  std::set<interval> s[2];
  std::multiset<int> values;
  void insert_pair(interval p) {
    auto[x, y] = p;
    if (x > y) std::swap(x, y);
    s[0].insert({x, y});
    s[1].insert({y, x});
  }
  void remove_pair(interval p) {
    auto[x, y] = p;
    if (x > y) std::swap(x, y);
    s[0].erase({x, y});
    s[1].erase({y, x});
  }
 
  std::optional<interval> is_end(int x, bool left) {
    int idx = left;
    auto it = s[idx].lower_bound({x, 0});
    if (it != s[idx].end() && it->first == x)
      return *it;
    else
      return std::nullopt;
  }
 
 public:
  void insert(int x) {
    bool ok = values.find(x) == values.end();
    values.insert(x);
    if (!ok) return;
    auto p1 = is_end(x - 1, true);
    auto p2 = is_end(x + 1, false);
    if (p1 && p2) {
      interval i = {p1->second, p2->second};
      remove_pair(*p1);
      remove_pair(*p2);
      insert_pair(i);
    } else if (p1 && !p2) {
      interval i = {p1->second, x};
      remove_pair(*p1);
      insert_pair(i);
    } else if (!p1 && p2) {
      interval i = {x, p2->second};
      remove_pair(*p2);
      insert_pair(i);
    } else if (!p1 && !p2) {
      interval i = {x, x};
      insert_pair(i);
    }
  }
  void remove(int x) {
    values.erase(values.find(x));
    if (values.find(x) != values.end()) return;
    auto it = s[1].lower_bound({x, 0});
    if (it->first != x && it->second != x) {
      interval p1 = {it->second, x - 1};
      interval p2 = {x + 1, it->first};
      remove_pair(*it);
      insert_pair(p1);
      insert_pair(p2);
    } else if (it->first != x && it->second == x) {
      interval p = {x + 1, it->first};
      remove_pair(*it);
      insert_pair(p);
    } else if (it->first == x && it->second == x) {
      remove_pair(*it);
    } else if (it->first == x && it->second != x) {
      interval p = {it->second, x - 1};
      remove_pair(*it);
      insert_pair(p);
    }
  }
  int get(int x) {
    if (values.find(x) == values.end()) return x;
    auto it = s[1].lower_bound({x, 0});
    return it->first + 1;
  }
};

→ Ответить

rishabhdeepsingh

4 года назад, # ^ |

This gives WA in some cases. https://atcoder.jp/contests/hhkb2020/submissions/17318696 Can we do it without the use of std::optional as well?

→ Ответить

f2lk6wf90d

4 года назад, # ^ |

Indeed, {p2->first, p1->first} should be {p1->second, p2->second}: https://atcoder.jp/contests/hhkb2020/submissions/17323306

→ Ответить

rishabhdeepsingh

4 года назад, # ^ |

what about the std::optional ?? It is not available in c++-14

→ Ответить

f2lk6wf90d

4 года назад, # ^ |

You could use std::pair<T, bool> instead of std::optional<T>.

→ Ответить

rek

7 лет назад, # |

+13

You can also use a bit trie — this is a trie made of bits of your numbers, thus it is represented as a binary tree (since there are only two types of bits — 0 and 1), and its depth is $\text{[math]}$ .

When you add a number, you mark the vertex that representing this particular number as a full binary tree, and then you recursively go over its parents marking them as full binary trees if their left and right children are both full binary trees as well.

When you remove a number, you unset the "full binary tree" mark and do so for all its parents.

When you want to get mex, you should do the DFS towards x, avoiding visiting subtrees which are marked as full (because there are no free values in such subtrees), and attempt to visit a left subtree first if you are allowed to by the x restriction.

I admit that there are better solutions above for this particular problem, but it's good to know about that function of a bit trie: for example, you can perform both "find mex" and "xor all the values" operations with a single data structure. This structure was even proposed to be implemented once on Open Olympiad (check the latest problem).

code (multiset edition, not tested tho lol)

#include <bits/stdc++.h>

using namespace std;

struct Node {
    Node *children [2];
    bool is_full = false;
    int cnt = 0;

public:
    Node() {
        children[0] = nullptr;
        children[1] = nullptr;
    }
};

inline Node *nonnull(Node *&n) {
    if (!n) n = new Node();
    return n;
}

void add(int val, int level, Node *n) {
    if (level == -1) {
        n->cnt++;
        n->is_full = true;
        return;
    }

    int cur_bit = (val & (1 << level)) >> level;
    add(val, level-1, nonnull(n->children[cur_bit]));

    n->is_full = nonnull(n->children[0])->is_full && nonnull(n->children[1])->is_full;
    n->cnt = nonnull(n->children[0])->cnt + nonnull(n->children[1])->cnt;
}

void remove(int val, int level, Node *n) {
    if (level == -1) {
        n->cnt--;
        n->is_full = n->cnt > 0;
        return;
    }

    int cur_bit = (val & (1 << level)) >> level;
    remove(val, level-1, nonnull(n->children[cur_bit]));

    n->is_full = nonnull(n->children[0])->is_full && nonnull(n->children[1])->is_full;
    n->cnt = nonnull(n->children[0])->cnt + nonnull(n->children[1])->cnt;
}

int answer(int val, int level, Node *n) {
    if (n->is_full) {
        return -1;
    } else if (n->cnt == 0) {
        return val;
    }

    int cur_bit = (val & (1 << level)) >> level;
    int ans = answer(val, level-1, nonnull(n->children[cur_bit]));
    if (cur_bit == 0 && ans == -1) {
        ans = answer((val >> level << level) | (1 << level), level-1, nonnull(n->children[1]));
    }

    return ans;
}

Node root_node;

int main(void) {
    int n;
    scanf("%d", &n);

    add(0, 31, &root_node);

    for (int i = 0; i < n; i++) {
        int t, val;
        scanf("%d%d", &t, &val);

        if (t == 1) {
            add(val, 31, &root_node);
        } else if (t == 2) {
            remove(val, 31, &root_node);
        } else if (t == 3) {
            printf("%d\n", answer(val, 31, &root_node));
        }
    }

    return 0; 
}

→ Ответить

dgupta0812

4 года назад, # ^ |

-8

This method is really awesome.

→ Ответить

Erdenebayar

7 лет назад, # |

-9

you can use ordered set from library, it's enough for your problem

→ Ответить

tfg

4 года назад, # |

+99

You can solve it with 2 sets. Keep one set of values that are in the set, and one set of "next values that aren't in the set". So if x is in the set and x+1 isn't in the set, x+1 is in the second set. Query can be answered by checking if x is in the first set, if it is the answer is the lower bound of the second set otherwise the answer is x.

→ Ответить