I have been trying to find a efficient Linear Container Data Structure which can support:
1. Fast access to any index
2. Fast insert after any index
For example:
Initial structure:
Index: 0 1 2 3 4 5
Data: 7 19 1 3 12 2
Accessing index 4 gives 12
Accessing index 5 gives 2
After I insert 5 after index 2:
Index: 0 1 2 3 4 5 6
Data: 7 19 1 5 3 12 2
Accessing index 4 now gives 3
Accessing index 5 gives 12
Vectors:
Vectors perform well for operation 1 (Access index i).
Linked lists:
Linked lists can perform well for operation 2 (Insert after index i) if we have a pointer to the node at index i.
Balanced binary trees:
- We can assign indices to every value inserted
- If the index is a b-bit number, the first value inserted gets index 2**(b-1)-1
- Every new value inserted is assigned an index that is an average of indices of previous and next elements.
- In this scheme, we can quickly reach a point where two values have neighboring indices. (Worst case O(b) operations.)
Does anyone have a better solution to handle these operations?
You can also share any known bounds on these operations (supported together) that you are aware of.
Auto comment: topic has been updated by NiKS001 (previous revision, new revision, compare).
Auto comment: topic has been updated by NiKS001 (previous revision, new revision, compare).
Hello! I'm not really sure if this is exactly what you're looking for, but there's a technique called "Square Root Decompostion" that works really well for these kind of problems! Let's create a vector of linked lists! In position i (counting from 1) of your vector, you'd have a linked list that contained the elements [(i — 1) * sqrt(n), i * sqrt(n) ). In other words, you'd have sqrt(N) positions, each containing sqrt(N) elements and end up with something like this:
V[1] = {0, 1, 2} V[2] = {3, 4, 5} V[3] = {6, 7, 8}.
Once you get this, it's easy to see the following: To find an element, you would only see in which position of the vector it's stored, and follow the linked list until you get to the desired element, taking O(1) for the first lookup and O(sqrt(N)) for the second one, and similarly, to insert an element, take the index, and just add it to the linked list (same time complexity). The problem with this second type of query is that a single list can eventually grow to be more than O(sqrt(N)), so every once in a while, recalculate the vector of linked lists. What's cool about this is that it usually works every time you have two types of queries that work linearly by themselves, but get ruined when combined. Just decompose your list into pieces of sqrt(N) size... Better than this, I can only think of a Self Balancing Binary Search Tree (such as a Treap), but I don't know of anything at most O(n).
Hi joseacaz,
Thanks for the suggestion. I did, indeed, try a similar train of thought. I tried to divide the sequence first into n**(1/k). Each section of the sub-level into (current-block-size)**(1/(k-1)). Each section of the new sub-level into (current-block-size)**(1/(k-2)) ... so on until (current-block-size)**(1/2). If only I could figure out the resizing of each component. The optimal result would then have been log(n) for lookup by choosing k as log(n). (My hunch is that insert would also be log(n) if only the resizing part was handled — could we employ something like table doubling?). The idea here leaves a lot of questions open.
Also the O(1) for first lookup, you mentioned, shouldn't work if we have blocks of varying sizes. I believe it would be O(no. of blocks).
I hope there would be a simpler solution (probably with trees) that might achieve log(n).
Then you can look at Treap. Both of operations can be complete for O(logn) in average.
ignore
Hi balalaika. Seems like I misunderstood the data structure earlier. How can we query Treaps using indices instead of the value? I couldn't figure it out from Wikipedia. Based on my understanding, Treaps are randomized binary search trees and should have the same shortcomings as other binary search trees?
Learn treaps from here — Part 1, Part 2, for competitive programming.
What you need is the "Implicit Treap", described in the Part 2.
Thanks Rezwan.Arefin01 for the useful link.
Yes it is. But you can store in each node size of subtree. Then you can just replace key by size in split method. And spliting into two tries by key becomes splitting into two tries by size of left subtree.
But in both split/merge operations you need to restore size of subtree at the end.
Thanks for the note. Now that I understand Implicit Treap, it makes sense.
It's possible to have O(1) lookup if you use deque instead of linked list for each block (but insertion are still O(block), so any "implicit" BBST (like Treap) will do it better :)
I see what you mean. Indeed, Treap is an excellent tool. The code is much simpler than other BBST implementations.