disclaimer: yet again i do not pretend to claim that this algorithm is useful. bin lifting is probably better in all aspects.
======================================================================================
intro
if you havent alr, consider reading this since most of my terminology will be from there.
cube root decomp
so why is my sqrt lca code so slow? its since sqrt is a really bad factor to have on your query. well now i present to you an algorithm that handles it all in O((n + q)cuberoot(n)) where n is the number of nodes on the tree, and q is the number of queries. (this algorithm is completely useless, but it will make explaining everything else easier). so imagine if i picked a constant CUBE where CUBE^3 >= n and CUBE^3 is minimized. so now let's define super[u] as the parent CUBE steps up and duper[u] (i know amazing naming skills) the node CUBE^2 steps up (for both of these arrays the value is -1 if going up that many steps makes the node go above the root). we can find super by iterating trivially CUBE times. but we can find duper[u] by iterating CUBE times over super so we can precompute both in O(n cbrt(n)). cbrt is cube root, i cant think of a better shorthand. here is an implementation of the idea.
in practice, this is extremely scuffed and not that much better than sqrt lca. but...can we generalize it past sqrt and cbrt? well the answer is that we can with a tiny tweak.
nth root decomp
so the above code actually runs in O(3 * cbrt(n)) but the 3 doesnt really matter, but as we go to the fourth root, we'll need 4 arrays and 4 loops, and 5 for the fifth root. **so it generalizes to O((n + q) * (r + n^(1/r))) time for a given root r (where root here is referring to exponent type things, not tree type things). and it takes O(n * r) memory. here is the implementation for such an idea. you can toy around with values of RT and LOG, and if you set RT to be log2(n) then you end up with bin lifting. just note i did not bother benchmarking the times for this idea since i did not deem it as better than bin lifting. but if you just do the math, setting RT as 6 will give you around the same time complexity as bin lifting with 3 times less memory.
conclusion
I am sorry abt how rushed this was. if there are any questions ill be happy to answer them in the comments. yet again i reiterate, i do not see a reason to use this over bin lifting, but someone else like me might this very interesting so this blog would have been worth it.
Thank you for reading and i apologize for any typoes, mistakes, or rushedness.