adamant's blog

By adamant, history, 20 months ago, In English

Hi everyone!

Today we will talk about something that might be useful in programming competitions. Yay! Also great thanks to antontrygubO_o for sharing this with me and others in a Discord server.

Let's implement a data structure that maintains a big number $$$N$$$ in base $$$b$$$ and supports the following:

  • Given (possibly negative) integers $$$|x|, |y| \leq n$$$, add $$$x b^y$$$ to $$$N$$$.
  • Given $$$k$$$ and assuming $$$N \geq 0$$$, print the $$$k$$$-th digit of $$$N$$$.
  • Check if $$$N$$$ is positive, negative or equals to $$$0$$$.

Each operation should take at most $$$O(\log n)$$$ amortized time and $$$O(q)$$$ memory. While some approach you may think of immediately would imply using segment tree, or a similar structure, the solution proposed here only requires std::map, so it's much shorter and easier to implement (at the slight expense of increased constant factor). It may be used in the following problems:

If you implement the big integers in these numbers the standard way (i.e. keeping digits in the $$$[0, b)$$$ segment, carefully executing carries, etc), you will quickly learn that you may get in trouble because you may be forced to do and undo a lot of carry operations which chain up and so you need to frequently change large segments between the values of $$$0$$$ and $$$b-1$$$.

Now, stop being fooled by the non-negative propaganda! You don't have to do it! Let's give ourselves some slack and allow negative digits. Well, just a bit of them. Instead of maintaining digits in $$$[0,b)$$$, let's now maintain them in the interval $$$(-b, b)$$$. It seems like a tiny change, but the effect is tremendous. On one hand, the representation of any number is not unique anymore. On the other hand, when we actually reach the value $$$b$$$ or $$$-b$$$, we wrap them back to $$$0$$$, and carry $$$1$$$ or $$$-1$$$ to the next digit correspondingly.

Noticed anything? The carry now wraps us from the endpoints of the interval to its middle instead of from one endpoint to another! It would be easy to add $$$1$$$ to a particular bit, turn it into $$$b$$$ and cause a chain of carries by it. But! If after that we add $$$-1$$$ to the same bit, it will not wrap all the bits back to $$$b-1$$$! It will just change this specific bit to $$$-1$$$! So, we give up the uniqueness of the representation, but we gain a whole lot of stability in exchange.

The C++ implementation for the first two queries is also quite concise:

code

I tested it on #2302. 「NOI2017」整数 and it works!

P.S. Applying it to 1817E - Half-sum and 1810F - M-tree is left to the curious reader as an exercise :)

P.P.S. Is this trick well-known? Does it have a name?

  • Vote: I like it
  • +263
  • Vote: I do not like it

»
20 months ago, # |
  Vote: I like it +58 Vote: I do not like it

At the request of jeroenodb and in honor of great antontrygubO_o I also suggest calling the structure "Trygub num".

»
20 months ago, # |
  Vote: I like it +20 Vote: I do not like it

»
20 months ago, # |
  Vote: I like it +77 Vote: I do not like it

Finally adamant posted something that we can understand:P

»
20 months ago, # |
  Vote: I like it +30 Vote: I do not like it

Allowing negative digits in number representations has already appeared in Concrete Mathematics, page 15-16, but this is the first time I see the "wrapping around" part.

»
20 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Now that I think about it, the intended solution to 102354E - Decimal Expansion uses pentagonal number theorem to represent the constant $$$\frac{9}{10}\frac{99}{100}\frac{999}{1000} \dots$$$ as a Trygub number in base $$$10$$$ and recover the corresponding digit. So many applications!

»
20 months ago, # |
  Vote: I like it 0 Vote: I do not like it
»
20 months ago, # |
Rev. 6   Vote: I like it +24 Vote: I do not like it

I really wanted to know why the amortized complexity is good. So I decided to prove it for myself:

We will be using a potential function, which represents how bad we messed up the data structure. If the potential function is high it means we need to clean up a lot of trash.

Let's take as potential function:

$$$\phi(\text{trygub num DS}) = c\log(n) \sum | \text{digs}[i]|/b$$$

The absolute value of the digits, times a constant $$$c \log(n)$$$, which isn't too important for now. It's intuitively clear that the higher the absolute values of the digits, the more carries can happen in future operations, so such a state of the data structure is worse. Good to note is that this potential function starts at $$$0$$$ and its value is always non-negative.

To prove the amortized bound, let's see how the potential changes with the add operation. (It is the only operation that changes the potential function).

In the $$$\texttt{add}$$$ function, if there are $$$k$$$ carries, the function needs to do $$$O( (1+ k) \cdot \log(n))$$$ work. When a carry is happening, it means we subtract $$$\text{carry} \cdot b$$$ from $$$\text{digs}[i]$$$ and add back $$$\text{carry}$$$ to $$$\text{digs}[i+1]$$$. The carrying always decreases $$$|\text{digs}[i]|$$$ and either increases or decreases $$$|\text{digs}[i+1]|$$$. In the worst-case, the potential function decreases by at least $$$ c \log(n) \cdot (1 - \frac{1}{b})$$$. So

$$$T_\text{amortized} = \Delta \phi + \text{actual work} \leq -k \cdot c \log(n) (1 - \frac{1}{b} ) + O( (k+1) \log(n))$$$

If the constant $$$c$$$ is chosen big enough, such that the $$$-k$$$ term is bigger than the $$$+k$$$ term inside the big Oh notation, the $$$+k$$$ and $$$-k$$$ terms can cancel. So $$$T_\text{amortized} = O(\log(n))$$$.

We didn't yet account for the extra potential that is due to the addition of $$$x$$$ to the digit. If we can assume $$$x \leq b$$$, then this adds potential $$$\leq c \log(n) \cdot b / b = c \log(n)$$$, so it is still amortized fast.

If $$$b \ll x $$$, the amortized analysis does not work (you can prove some slightly worse bounds though). Luckily, this can be easily fixed. It doesn't bother us if we change the base used in the data structure to $$$b^\prime = b^k$$$, for $$$k>0$$$, so we just choose a large enough $$$k$$$, such that $$$b^k \geq \max X$$$.

You can even prove if $$$\max X = O( \text{base}^c)$$$, for some constant $$$c$$$, the amortized analysis still works.

  • »
    »
    20 months ago, # ^ |
      Vote: I like it +24 Vote: I do not like it

    Btw, a good way to think about how a potential function actually proves an amortized bound is this:

    The potential function can be seen as a bank account, where you can put money into to save it for later. Whenever you're decreasing the potential you are taking away money. Everyone knows money = time. So in this analogy, to pay for the time of a slow operation, you take some money out of the bank account. With the money you saved in earlier operations by paying extra and putting it in the bank account. By ensuring the potential function is non-negative you are ensuring you're never spending more money than you have in the bank.

»
20 months ago, # |
  Vote: I like it +16 Vote: I do not like it

Now, stop being fooled by the non-negative propaganda!

Thank you adamant for keeping Codeforces from propaganda!

»
7 months ago, # |
  Vote: I like it +8 Vote: I do not like it

We can utilize the fact that additions will be done on some segment [b,b+x], so we can insert and update values in map using an iterator. I did some benchmarking and optimized version gets reasonable faster.

code

I also added function to check if the number is <= 0, and pop function which clears digs map so that last non-zero digit is at last map element or last-1 (implemented for positive trygub num).

benchmark code

Further, we can utilize array and set if we know that additions will be on values <= maxN. But this actually doesn't get faster than optimized map, only in certain cases.

set code