In trying to solve the problem involving tree isomorphism, I came across a paper explaining an algorithm for checking if two trees are isomorphic.
However, I cannot understand the complexity analysis of the AHU algorithm described in the paper. Why is the worst case O(N^2)
. It seems to me that when we go down in the tree the lengths of all the string we need to sort is decreasing as well, and hence we are dealing with something O(NlogN)
.
My second question is about the improvement part (section 4.3 AHU algorithm improvement). I don't understand what the author means by sorting by level, and how can we convert those strings (canonical names) to integers.
Hope that someone could help me. Thanks in advance.
Auto comment: topic has been updated by cote (previous revision, new revision, compare).
This is mentioned in the paper you linked:
“Observation 6. Consider a tree of n vertices in one long strand. Time needed to compute the root name of this tree is proportional to 1 + 2 + · · · + n, which is Ω(n^2).”
The reason is that in the unoptimized version of the algorithm, the length of the tree's name corresponds with its size, and in the given example, there are O(N) subtrees of size O(N) (for example, the top N/2 nodes all have size greater than or equal to N/2), giving the O(N^2) result.
This is also mentioned in the paper you linked:
“But... There is a hole in our discussion — we have forgotten about sorting in AssignCanonicalNames. So, in this case AHU-Tree-Isomorphism is just O(n log n)... But there is a way to sort canonical names in linear time — you can read about it in [1].”
where [1] refers to “The Design and Analysis of Computer Algorithms” which I don't have, so I don't know the solution here.
It simply means processing the tree level-by-level, starting from the bottom. At each level, the number of nodes must (obviously) match, and so must the names of the nodes. But it doesn't matter what those node names are, so you can just relabel them as integers between 1 and N.
This works because at each level, you only care about which nodes at the previous level have the same shape; you don't care what the shape is exactly. So there is no need to encode the shape into the node name, and there is no need for node names to be distinct across levels.
If it's still unclear, please check the slides 19-20 here: https://docplayer.net/25854793-Tree-isomorphism-alexander-smal-joint-advanced-student-school-st-petersburg-state-university-of-information-technologies-mechanics-and-optics.html