Hi everyone! After a relatively long lull, I decided that my contribution growing too slowly the hour has come to please you with another article in the blog :)
2 months ago user Perlik wrote an article, in which he described a very interesting STL implemented data structure that allows you to quickly perform various operations with substrings. Some time after I tested it on various tasks and, unfortunately, tend to get a negative result — rope was too slow, especially when it came to working with individual elements.
For some time, I forgot about that article. Increasingly, however, I was faced with problems in which it was necessary to implement set with the ability to know ordinal number of item and also to get item by its ordinal number (ie, order statistic in the set). And then I remembered that in the comments to that article, someone mentioned about the mysterious data structure order statistics tree, which supports these two operations and which is implemented in STL (unfortunately only for the GNU C++). And here begins my fascinating acquaintance with policy based data structures, and I want to tell you about them :)
Let's get started. In this article I will talk about IMO the most interesting of the implemented structures — tree. We need to include the following headers:
#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp> // Including tree_order_statistics_node_update
After closer inspection you may find that the last two files contained in the library
#include <ext/pb_ds/detail/standard_policies.hpp>
Namespace, which we will have to work in newer versions of C++ is called __gnu_pbds;
, earlier it was called pb_ds;
Now let's look at the concrete structure.
The tree-based container has the following declaration:
template<
typename Key, // Key type
typename Mapped, // Mapped-policy
typename Cmp_Fn = std::less<Key>, // Key comparison functor
typename Tag = rb_tree_tag, // Specifies which underlying data structure to use
template<
typename Const_Node_Iterator,
typename Node_Iterator,
typename Cmp_Fn_,
typename Allocator_>
class Node_Update = null_node_update, // A policy for updating node invariants
typename Allocator = std::allocator<char> > // An allocator type
class tree;
Experienced participants may have already noticed that if initialize the template only the first two types, we obtain almost exact copy of the container map
. Just say, that this container can be set
, for this you just need to specify the second argument template type as null_type
( in older versions it is null_mapped_type
).
By the way Tag
and Node_Update
are missing in map
. Let us examine them in more detail.
Tag
— class denoting a tree structure, which we will use. There are three base-classes provided in STL for this, it is rb_tree_tag
(red-black tree), splay_tree_tag
(splay tree) and ov_tree_tag
(ordered-vector tree). Sadly, at competitions we can use only red-black trees for this because splay tree and OV-tree using linear-timed split operation that prevents us to use them.
Node_Update
— class denoting policy for updating node invariants. By default it is set to null_node_update
, ie, additional information not stored in the vertices. In addition, C++ implemented an update policy tree_order_statistics_node_update
, which, in fact, carries the necessary operations. Consider them. Most likely, the best way to set the tree is as follows:
typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;
If we want to get map but not the set, as the second argument type must be used mapped type. Apparently, the tree supports the same operations as the set
(at least I haven't any problems with them before), but also there are two new features — it is find_by_order()
and order_of_key()
. The first returns an iterator to the k-th largest element (counting from zero), the second — the number of items in a set that are strictly smaller than our item. Example of use:
ordered_set X;
X.insert(1);
X.insert(2);
X.insert(4);
X.insert(8);
X.insert(16);
cout<<*X.find_by_order(1)<<endl; // 2
cout<<*X.find_by_order(2)<<endl; // 4
cout<<*X.find_by_order(4)<<endl; // 16
cout<<(end(X)==X.find_by_order(6))<<endl; // true
cout<<X.order_of_key(-5)<<endl; // 0
cout<<X.order_of_key(1)<<endl; // 0
cout<<X.order_of_key(3)<<endl; // 2
cout<<X.order_of_key(4)<<endl; // 2
cout<<X.order_of_key(400)<<endl; // 5
Finally I would like to say about the performance of order_statistics_tree in STL. For this, I provide the following table.
Solution\Problem | 1028 | 1090 | 1521 | 1439 |
order_statistics_tree, STL | 0.062 | 0.218 | 0.296 | 0.468 |
Segment tree | 0.031 | 0.078 | 0.171 | 0.078 0.859* |
Binary Indexed Tree | 0.031 | 0.062 | 0.062 | |
* The final task requires direct access to the nodes of the tree for the implementation of solutions for O (mlogn). Without it, the solution works in O (mlogn*logn).
As you can see from all this , order_statistics_tree relatively little behind handwritten structures, and at times ahead of them in execution time. At the same time the code size is reduced considerably. Hence we can conclude is that order_statistics_tree — it is good and it can be used in contests.
Besides tree, I also wanted to describe here trie. However , I was confused by some aspects of its implementation, greatly limiting its usefulness in programming olympiads, so I decided not to talk about it. If anyone want he is encouraged to try to learn more about this structure by himself.
Useful links:
— Documentation of pb_ds
— Testing of pb_ds
— Using of pb_ds
— Demonstration of order_statistics_tree
— Demonstration of trie with prefix search
— Operations with intervals with handwritten update policy class
— More examples from that site
P.S. Sorry for my poor English :)
Example of trie with search of prefix range.
Problem: 1414
Solution: http://ideone.com/6VFNZl
Is there a way of counting number of strings in the trie with a certain prefix without iterating through them all?
You augment the trie node to also contain a number. Update this number everytime you insert a string into the trie. To get the number of strings which share the prefix, Just traverse the prefix and output the num in the ending node.
Возможно, вам покажутся слегка нетривиальными решения деревом отрезков и деревом Фенвика, особенно, задач 1521 и 1439. Скорее всего, позже я также предоставлю статью, в которой опишу некоторые интересные способы использования этих структур, которые редко встречаются.
======================================================================================= You may be wondered about how I use segment tree and binary indexed tree in my solutions, especially for problems 1521 and 1439. Most likely, later I'll provide an entry about some interesting ways of using this structures, which are quite rare.
Here it is :)
This is really useful. Thanks a lot!
Very useful article! I need order-statistics on a multiset. How should I define the tree as?
As I know, there is no implemented tree multiset in STL. However you can use pair<T,int> as a key where the second element in pair is the time when item has been added.
Apparently, you can. Once I tried to write less_equal instead of less and it started to work as multiset, I even got AC using it in region olympiad)
I can't erase elements with less_equal comparator, e.g. this code output "1"
So I guess it's not very useful thing (or I do something wrong).
UPD: I can delete iterator which I got with lower_bound. But it works incorrectly. This code erase 1, not 0
Wow. then it really sucks. Seems like I only used it with insert operations and strangely enough it worked)
5 years later I can say it is useful in some problems and this helped me today in this problem
glad it helped:)
You can erase elements from the multiset.
Well, actually it works fine and exactly does what you want! The issue is that you're passing
less_equal<int>
as the tree comparator. Therefore it uses the same function forlower_bound()
. By definition of lower_bound function (according to cplusplus.com) it finds the first element not comparedtrue
. Thus returns the first element greater thanval
which is1
in your example.In order to make sure you may even test
set<int,less_equal<int> >
which results the same.What if I want to calculate the index of upper_bound of a particular element? Suppose we have: 1 1 2 3 4 then how to find index(upper_bound(2))?
UPDATE: Maybe it is = order_of_key(number+1) ?
So to erase an element from ordered_set with less_equal I used lower_bound on (element to be erased — 1) and then erased the iterator I got from lower_bound and it works only if the element to be erased is present in the set.
5 years later vol2, lower_bound doesn't work but you can do it like d.erase(d.find_by_order(d.order_of_key(0)) this erases iterator correctly, but it's little slow.
I don't actually know how does it work, but if you use upper_bound instead of lower_bound it woulf work correctly.
Another drawback of using
less_equal
instead ofless
is thatlower_bound
works asupper_bound
and vice-versa. CodeAn excellent deduction.
Thanks this was helpful
in today's leet c?
yes bro
https://leetcode.com/contest/weekly-contest-342/submissions/detail/938214410/
here is my solution.
typedef tree<long long, null_type, less_equal<>, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;
using " less_equal<> " makes it a multiset7 years later, I was using this incorrect template for long time, until recently ended up messing in live contest here.
https://codeforces.net/contest/1998/submission/275629343
where find() method always returning set.end() and so erase was wrong. Even lower_bound was giving some wierd behaviour.
Upd. It seems you have to use lower_bound for upper_bound and vice versa
Now, i could able to erase as
set.erase(set.upper_bound(x));
And passed https://codeforces.net/contest/1998/submission/275639066
yesss
What about the comparator i.e. less<int>
typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;
Can we use this in this question? or we can't use it, as I am not able to implement the multiset part
just use the fucking binary search on this problem
The 3rd template argument must be
less_equal<int>
. But adamant, is it the correct way to do this ? Since as far as I know, most of the STL containers require a comparator that offers a strict weak ordering (Not sure of the exact reasons though). So, will there be some drawbacks of trying to construct a multiset this way?To use order-statistics on a multiset: Use:: tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> s;
find_by_order and order_of_key work the same as for a set.
However for searching, lower_bound and upper_bound work oppositely. Also, let's say you want to erase x, use s.erase(s.upper_bound(x)) (as upper bound is considered as lower bound)
This is actually great. Whenever I needed to handle duplicate I used take pair<val,index>. This is much simpler.
.(same comment as above)
what will be the complexity of erase operation? O(logn) or O(n)
O(logn)
Really??
Yes.
Is there any efficient function to merge 2 trees ?
You can do in log(n) if the greatest element of 1 tree is smaller than smallest of other. Otherwise, I don't you have a better option. Tell me as well if you have found something interesting.
How do you merge two non-intersected rbtrees (as in the article) in O(lg n) time? I find that the default join() function takes linear time...
Have you found something interesting about merge ? Im trying to do .join but it throws error.
how can i use it like multiset ?
Main idea is to keep pairs like {elem, id}.
thanks a lot :)
like me.order_of_key({x, 0}) me.find_by_order({x,0}) dose not work.. why??
*me.find_by_order({x,0})
still it does not work.
sure it does work, but you cannot print a pair so you have to do it like this
cout << me.find_by_order(1)->first ;
wtf, find_by_order takes number
how to use find_by_order if I'm using ordered_set with pairs. ~~~~~ typedef tree< pair<int, int>, null_type, less<pair<int, int>>, rb_tree_tag, tree_order_statistics_node_update> ordered_set; ~~~~~
nice technique. worked fine! thanks a lot.
typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;
Hi, adamant, the code files in Useful Links don't seem to work. Could you fix them?
Thanks for this great post. I am looking forward to your next and next next posts.
Can you elaborate please?
For example, the code in "Demonstration of trie with prefix search" cannot run on my computer. I saw that there was some old syntax like the namespace pb_ds. I changed it, then it returned a new error in another place. The truth is I am not good enough to change things any more. I hope that you can update it. (I know that I can use the trie code in one of your comments, but this post would be even better if the cost in Useful Links were also updated)
Thank you.
I can't edit the original files — they're not mine. But here is the correct version: http://ideone.com/BpZlYO
Thank you.
https://www.e-olymp.com/ru/problems/2961 Is it possible to solve this problem using this algorithm?
Can anybody share a Java equivalent class for this type of set or a code which acts according to above data structure?
It does not exist.
You may use instead:
I thought of number compression + fenwick tree, but this solution will work for only offline queries. I want to handle online queries. The best I can think of now is Treap + Segment Tree or Treap + Fenwick Tree. But here again is the problem of implementation of mixed data structure, I am unable to think how to implement that. Can you please help me?
Any idea on how to use this pre C++11?
How can I use a custom compare function in the "Key comparison functor" section for custom data types?
Just like for regular set..
I can use custom compare function for a set by using operator overloading. I want to know is there any other way to do this for both set and ordered set using lambda expression or just using a compare bool function?
Thank adamant you very much for your nice post.
I suppose you can overload operator and still use
less<T>
. Also you can use functors and lambdas in the way similar as for sets:Why does it only work with lambdas and not functions? On doing
Where
comp
is a comparator function, I get an error.It does not work for me. I tried using custom comparator but getting error. Please resolve my issue. Code Link: https://leetcode.com/submissions/detail/693650301/
I have written cmp struct but when i pass in tree argument it is giving me error.
Sorry, what doesn't work?
works just fine on my side.
P.S. I would really recommend against using
<=
as a comparator, because it's supposed to be anti-reflexive.Is it possible to search for the order of a non-key value by passing a comparator/function ? I sometimes find myself have to use my own tree templates instead because of having to write a search function that can cope with this task.
What is the constant factor of order_statistics_tree though it executes in logarithmic complexity ? I think it's constant factor is very high.
could anyone write the exact thing to use this data structure as map..pls.I'm not able do so.
And what exactly do you want from it? You can use something like that.
as it was mentioned in the article that we can use it as map by defining mapped type .So I tried to do that by couldn't. that's all;)
is this thing a order statistics tree
Can you write the multiset implementation of ordered_set. When I use less_equal then I'm not able to erase from the ordered_set. And when I use the pair for including duplicates I'm not able to use find_by_order({x,0}).
What's wrong with find_by_order({x, 0})?
It gives an error. It says no known conversion. error: no matching function for call to '__gnu_pbds::tree<std::pair<int, int>, __gnu_pbds::null_type, std::less<std::pair<int, int> >, __gnu_pbds::rb_tree_tag, __gnu_pbds::tree_order_statistics_node_update>::find_by_order()
Wait, find_by_order takes number $$$k$$$ and returns $$$k$$$-th element. What exactly do you expect it to return?..
A number .
I think you should use order_of_key then
Thanks adamant I did that.
Actually find_by_order( x) takes in an integer input because it tells us the element present in the position x. Whereas find_by_order({x, 0}) is a syntax error and it wont work.
can we use it as multiset?
u can use pair<int,int> for manage the duplicate values..
typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;
It is:
typedef tree<int, null_type, less_equal<int>, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;
thank you bro :) it worked for duplicate elements too..
what to do if i want to merge two ordered multisets?
How can I erase element from order_set by it's value?
just do
ordered_set<T> st; st.erase(key);
This doesn't work .
ordered_set :: iterator it; it=st.upper_bound(key); st.erase(it); it works;
Thanks bro
You can try this to erase by value from ordered multiset(or wwhatever it's called in technical terms)
How to use Ordered Multiset
To erase by value from Ordered Multiset:
os.erase(os.find_by_order(os.order_of_key(val)));
this was very helpful i did not want to modify the data structure by modifying it to contain <key,value>pair but wanted to use provided function of stl
(Sorry for necroposting) Does anyone know how to compile with the pbds header files on Mac OS X ? I think the g++ command is configured to use clang by default, and so it is not directly available. I've tried adding ext/pb_ds into the include folder (the same way you would enable bits/stdc++.h) but instead new dependencies come up.
For me, I installed the latest version of gcc (gcc 9.3.0_1) and compiled with gcc-9. It works on Mojave and Catalina and it should work on High Sierra (but I haven't tested it).
To install gcc-9, I used brew and the command
brew install gcc@9
(Again, sorry for necroposting) There is an important step that I've so-far seen all the mac g++ setup instructions ignore/skip. I also installed gcc through brew, but neither bits/stdc++.h header nor pbds data structures seemed to work. Putting the bits/stdc++.h file manually in the /usr/local/include folder allowed me to at least solve the header problem, but trying the same method for pbds spawned a lot of dependency issues.
The problem was that brew sets up the g++ command as g++-10 so that it doesn't clash with the default g++ command mac provides. So, alias-ing g++ to g++-10 in the .bashrc/.zshrc file would be enough for solving the issue if you compile using the terminal. But if you compile using an editor and the editor directly uses the /usr/bin/g++ binary for compilation, then alias-ing g++ wouldn't work anymore. For example, most of the popular VSCode extensions for CP I've seen use /usr/bin/g++ to compile. I wasn't aware that this was the root of the issue and had been missing pbds structures for a long time. The way to solve it is to simlink g++ to g++-10 and prepend it to the PATH variable so that the simlink is prioritized before the default g++.
(Replace the version names with your g++ versions)
cd /usr/local/Cellar/gcc/10.2.0_4/bin
ln -s ./c++-10 ./c++
ln -s ./g++-10 ./g++
Finally, put in your .bashrc/.zshrc file:
export PATH=/usr/local/Cellar/gcc/10.2.0_4/bin:$PATH
I might even find it useful myself if I ever need to setup a mac again in future.
(Sorry for Necroposting). But I have to update the details as from M1 onwards (possibly Big Sur) configurations have been changed. The
Cellar
directory is not available at/usr/local
anymore. You can do the following instead.cd /opt/homebrew/Cellar/gcc/12.2.0/bin
ln -s ./c++-12 ./c++
ln -s ./g++-12 ./g++
export PATH=“/opt/homebrew/Cellar/gcc/12.2.0/bin:$PATH”
Did you manage to solve PBDS issue on your mac ? Doing all above doesn't solve the PBDS issue.
Thanks for the updates.
Yes, PBDS seems to work for me. I'm running macOS Ventura on M1 (Btw I had to reinstall the command-line tools when updating from Big Sur to Ventura). Can you elaborate on the exact issue you're facing, or possibly provide some error logs?
Also, did you try compiling a source code that uses PBDS from the terminal? Because I've fixed similar issues for others who were compiling/running from an editor/IDE, but the main culprit was actually the editor. The editor might fail if you didn't properly point it toward the Homebrew g++.
I am also using ventura 13.2 on M1. I tried both on sublime build and manually on terminal to run a sample code. It shows the following error.
The
.zshrc
profile looks like as mentioned in the above command. I tried to add the missing file, then got reported for another missing file and so on. In fact theext/pb_ds
directory I put there myself.I think it might have to do with the configs in
.zshrc
. I checked my.zshrc
file and there seem to be some additional env variables I defined but I forgot why. I have these at the very top of the.zshrc
file:Check if you have similar paths and modify them accordingly before putting them in
.zshrc
. Also, the last path there (/Users/drswad/Desktop/CP/Setup/include
) is just for any custom header files. I put my debugging template andtestlib.h
in there, so that they always get included in every program on my PC and the editor linter doesn't complain with red squiggly lines.Also, try reinstalling command-line tools with
xcode-select --install
.Let me know if they resolve your issue.
Note on using
less_equal
as comparison function to use it as a multiset:_GLIBCXX_DEBUG
must not be defined, otherwise some internal check will fail.find
will always returnend
.lower_bound
works likeupper_bound
in normal set (to return the first element > it)upper_bound
works likelower_bound
in normal set (to return the first element >= it)find_by_order
andorder_of_key
works properly (unlike the 2 functions above).Some code to verify the points above: Try it online!
adamant, while discussion, someone suggested that the time complexity of using it as a set is amortized log(n), and this post says that it means that in some cases that can be O(n). I wonder if that is true ?? If yes, is there an alternative to policy based data structures ?? Here is one solution
It shouldn't be. And even if so, what's the deal? It will always be O(n log n +q log n) if you use set of numbers of size n and run q queries.
but this link line 4 says :
and if that's the case, won't the complexity be q*n instead of qlog(n) ?? which I suspect might be the reason of my solution getting TLE using policy based data structure while the editorial using treap and getting accepted (having same time complexity ).
Please guide me through it as I use this data structure very frequently.
It can't be. By definition amortized complexity means that algorithm is guaranteed to have such executing time if it's divided By the numbers of queries. When they say "few" they mean it
So, I should treat it as the worst time complexity of this data structure ?
If you don't revert operations and don't need it persistent then basically yes. In your case it is likely to be too large constant factor. But I'll look into it later.
Thanks,it just got me an AC.
If someone is having trouble to use these in windows with mingw compiler, try to find hash_standard_resize_policy_imp.hpp0000644 in MinGW\lib\gcc\mingw32\6.3.0\include\c++\ext\pb_ds\detail\resize_policy and rename it to hash_standard_resize_policy_imp.hpp. I dont know why it is named like this.
thanks bro ...do you know why it name like this ... why wrong extension is given to it..
I have defined a
bool cmp(pair<int,int> a, pair<int, int> b)
for comparing pairs. Is it possible to use that as the comparator for theordered_set
?Thank you this really helped me.
Is this actually STL? I only see files with gcc's implementation of the c++ standard library. Actual STL is quite old (the linked post references SGI's docs, and SGI doesn't even exist any more)
Nope, this has nothing to do with the C++ standard library.
This is a GNU extension, so it has nothing to do with the STL which (incorrectly) refers to the C++ standard library.
STL doesn't even refer to the C++ standard library. STL is the Standard Template Library from long ago which heavily influenced the C++ standard library but is not the same thing. https://stackoverflow.com/a/5205571
I know, but here and in many other places "STL" is incorrectly used to refer to the C++ standard library.
The first returns an iterator to the k-th largest element (counting from zero)
Shouldn't it be
the k-th smallest element
? adamantHow to merge two ordered sets?
Iterate through every element in the smaller set and append it to the bigger one
thanks brother
When to use BIT over PBDS other then memory limit
https://codeforces.net/contest/459/problem/D can we solve this question using pbds ? Its giving me runtime error — https://codeforces.net/contest/459/submission/81689503
can we use pair in place of int ??
Sure, why not. We may use any type with operator < defined.
Is
insert
not too slow? I tried 10^7 insertions and it took over a minute.Doesn't find_by_order(k) returns the kth smallest element in the set? In the article the values given seems like the kth smallest ones not the largest ones
Just saw this post and I am wondering can https://codeforces.net/contest/237/submission/2804338 be the first submission using pbds? (I did not actually used pbds in this submission just included it as part of my template.) IIRC, Gerald used to see my usage of this trick in Dec, 2013 and asked about its usage.
Can I make the ordered set get the first element greater than x or even greater than or equal?
*find_by_order(order_of_key(x)) -- this will be first greater or equal element than x if u wanna only greater *find_by_order(order_of_key(x+1)) <-- write this
Please note
erase(key)
does not work if you are usingordered_map
typedef tree< int, map_value_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_map;
Thanks a lot bro
can we use this in IOI
Regarding the trie implementation, is there an easy way to iterate over it? I'd like to be able to give a word one letter at a time, and get back the node I'm currently on, without redoing any work.
Could not implement it in time for today's E :(
I also used this for the E today, but I found this blog was too confusing so I used this tutorial instead: https://www.geeksforgeeks.org/ordered-set-gnu-c-pbds/ lmao.
12 Years Later...
1) While going through the official documentation of Policy Based Data Structures , I found an internal test claiming that `ov_tree_tag' is better for split and join methods.
The test can be found here.
2) Moreover, we can't use
tree_order_statistics_node_update
with theov_tree_tag
.I can't find any proof for this :(
3) After some years, the links to some of the codes seem to be down. They can be found in this git repository.
Thank you for this amazing blog !