Epilogue's blog

By Epilogue, history, 13 months ago, In English

Hi!. I'm solving a problem. Given a tree with N nodes with colors and q queries. For each query given two integers (u, v), answer if the path(u, v) on the tree is dominant. A path is called dominant when the most frequently occuring element appears strictly more than pathlength / 2.

Are there any offline, online solutions for this? If you have any ideas please comment bellow.

  • Vote: I like it
  • +10
  • Vote: I do not like it

»
13 months ago, # |
  Vote: I like it -28 Vote: I do not like it

Did you try to google before asking this? If you did, you would atleast know that it can be solved easily offline with Mo's.

  • »
    »
    13 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    I know Mo for the subtree one but this one is path query. Can you using MO with it or Did you even read the problem properly?

    • »
      »
      »
      13 months ago, # ^ |
        Vote: I like it -16 Vote: I do not like it

      Did you even read the problem properly?

      Yes, I read your problem properly. You however, still didn't read my comment properly. A simple google search yields this.

      • »
        »
        »
        »
        13 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Oh I saw that one before but the constraint is like 5*10^5 so.

        Anyways tks for reminding me that.

»
13 months ago, # |
Rev. 6   Vote: I like it +4 Vote: I do not like it

Arbitrarily root the tree. Let $$$cnt(i, j)$$$ denote the frequency of color $$$j$$$ on the path from root to node $$$i$$$. Now the number of occurrences of a color $$$x$$$ in $$$path(u, v)$$$ is $$$cnt(u, x) + cnt(v, x) - 2 * cnt(lca(u, v), x) + (color[lca(u, v)] == x)$$$. A path is dominant if the number of occurrences of the median color in the path is strictly greater than $$$\frac{pathlength}{2}$$$. $$$cnt(i, j)$$$ can be calculated online in $$$\mathcal{O}(\log N)$$$ with the help of persistent segment tree and the median color in a path can be calculated by binary search with range sum queries on the same tree. Overall time complexity will be $$$\mathcal{O}(N \log N + Q \log^2 N)$$$ and memory complexity $$$\mathcal{O}(N \log N)$$$.

The problem can also be solved in $$$\mathcal{O}((N + Q) \log N)$$$ by an approach similar to walking on a segment tree once you have the persistent segment trees containing information on the frequency of colors on the path to root for each node. It can also be solved by a binary lifting approach using the fact that the majority of a sequence must be the majority of one of the sequences when it is partitioned into two.

  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Wow that's nice. I have thought about Persistent Segment Tree but I didn't think about finding the median on the path. Thanks.

  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Actually you can get rid of the binary search and just search the segtree going to the child node with the maximum sum on each step, so it'll become $$$O(NlogN + QlogN)$$$.

    • »
      »
      »
      13 months ago, # ^ |
      Rev. 3   Vote: I like it +3 Vote: I do not like it

      You cannot do that. The segment tree of node $$$i$$$ contains the information of the frequencies of colors on the path from root to node $$$i$$$. To get the frequency of colors on $$$path(u, v)$$$ you need to combine the information of three segment trees. On a step in binary search you need to check the sum of frequencies of colors less than mid. The median will be the largest element for which such sum is less than $$$\frac{pathlength}{2}$$$.

      • »
        »
        »
        »
        13 months ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        You can keep the 3 segtree roots with you and go to their right or left children on each step of the search.

»
13 months ago, # |
Rev. 5   Vote: I like it +2 Vote: I do not like it

Two approaches I can think of.

Approach 1
Approach 2
  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You don’t need HLD, you can just apply Approach 2 as a ‘binary lifting’-like dp and check the candidate in $$$O(\log n)$$$, making it $$$O(q \log n)$$$

    • »
      »
      »
      13 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      But how would you adapt the Boyer-Moore Majority Vote on the binary-lifting like method? Can we just try two possible candidates for the majority using the fact (which I have not proven, but does seem trivial) that if an element is a majority of $$$A + B$$$ it must either be a majority of $$$A$$$ or a majority of $$$B$$$?

      • »
        »
        »
        »
        13 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        The fact that you mentioned is indeed true, it’s not too hard to prove. However, you can just aggregate the results on the two independent paths with the same combining function.

      • »
        »
        »
        »
        13 months ago, # ^ |
          Vote: I like it +5 Vote: I do not like it
        • Proof for "if an element is a majority of $$$A + B$$$ it must either be a majority of $$$A$$$ or a majority of $$$B$$$" : $$$\newline$$$ Suppose both $$$A$$$ and $$$B$$$ have length of $$$2n + 1$$$. Then let $$$X$$$ and $$$Y$$$ be the number of majority for $$$A$$$ and $$$B$$$ respectively each with count of $$$n + 1$$$. For the majority of $$$A + B$$$ to be some element other than $$$X$$$ or $$$Y$$$, such element will need to have a count greater than that of $$$X + Y$$$. The maximum count we can get for any element other than $$$X$$$ or $$$Y$$$ is $$$|A + B| - cnt(X) - cnt(Y) = 4n + 2 - 2(n + 1) = 2n$$$. Since $$$2n < 2(n + 1)$$$ then it is impossible for any number other than $$$X$$$ or $$$Y$$$ to be the majority of $$$A + B$$$. $$$\newline$$$

        • Explanation for "how would you adapt the Boyer-Moore Majority Vote on the binary-lifting like method" : $$$\newline$$$ let $$$lft(x, y)$$$ be the resulting node after lifting $$$x$$$ by $$$y$$$, and let $$$dp_{i,j}$$$ be a pair $$$(cnt, val)$$$ that stores the answer after applying Boyer-Moore on all nodes on the path from $$$lft(i, 2^j)$$$ to $$$i$$$. updating the $$$dp$$$ is as follows : $$$\newline$$$ $$$dp_{i,0} = (1, val_i) \newline$$$ $$$(dp_{i,j} \neq dp_{lft(i, 2^j),j}) \implies dp_{i,j+1} = max(dp_{i,j}, dp_{lft(i, 2^j),j}) \newline$$$ $$$otherwise \implies dp_{i, j+1} = (dp_{i,j}.first + dp_{lft(i, 2^j),j}.first, dp_{i,j}.second) \newline$$$ This $$$dp$$$ can be preprocessed in a similar manner to $$$lft$$$. Now lets consider each query in the form : "is $$$path(x, y)$$$ dominant?", you can get Boyer-Moore for $$$path(x, lca(x, y))$$$ in using binary lifting on $$$dp$$$ in a similar manner to getting $$$lca(x, y)$$$. The answer is simply to check if the value of Boyer-Moore on $$$path(x, y)$$$ is the majority or not.

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I solved it using Persistent Segment Tree, I think will try some others approach too.

»
13 months ago, # |
  Vote: I like it +3 Vote: I do not like it

Maybe you can randomly choose some vertexs and count the occurance of it. If you randomly choose $$$x$$$ times, the error rate will be $$$\dfrac{1}{2^x}$$$, so $$$x$$$ be about $$$O(\log V)$$$ is enough.

»
13 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

First find a candidate for the most common color by calculating a pair (color, score) with a HLD path query. To combine pairs its easy. $$$(c1,s1)+(c2,s2)$$$ is just $$$(c1,s1+s2)$$$ if $$$c1=c2$$$ and $$$(c1,s1-s2)$$$ if $$$c1!=c2$$$ and $$$s1>s2$$$. Otherwise its $$$(c2, s2-s1)$$$

(The cool property of this operation is that, if there is a dominant color then the result of the operation has that color.

Then use another path query to compute real amount of appearances of that color along path (also HLD).

This way you can do online queries with updates.

It's probably overkill but maybe it gives some ideas

»
13 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Another idea

If the path is short (L < 2sqrt(N)) we can just check it in linear time using a hashmap or something.

If the path is long (L > 2sqrt(N)) the color must have at least L/2 appearances on the path, so it must have at least sqrt(N) appearances on the tree. This means that there will not be many colors that have this property and we can try and check all of them using something like binary lifting idk.

This works for online queries but no update s

  • »
    »
    13 months ago, # ^ |
      Vote: I like it -8 Vote: I do not like it

    I don't see that updates would be mentioned in post, so I think your solution can be further simplified:

    We can have at most sqrt colours with freq >= sqrt.

    For each we can build array d[c][u] = how many nodes have colour 'c' on path from u to root. Then calculating frequency of c on path from u to v is trivial in O(1), path_len >= 2sqrt can be solved in O(sqrt) per query. The other case with small length can be checked in O(length) as you said.

    *(Note, it works only if ML allows NsqrtN memory)

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Epilogue problem link plz..

»
10 months ago, # |
  Vote: I like it 0 Vote: I do not like it

An O(q.logn.C) solution is you randomize choosing value in path (u, v) Than break the query(u, v) into queries(u, v, w): count number of value w in path (u, v) you can iterate the value of a and than build fenwick tree => we count in O(logn)