I was trying to solve this problem, which requires computing probabilities of winning/losing in a turn-based two-player game. It's not hard to come up with a recurrent solution that pretty much resembles a DP, except for the fact that the states can repeat down the recurrence, producing cycles. Not knowing how to solve it, I resorted to this comment which describes the expected solution. In short, the idea is to have two arrays lo[x][y][z] and hi[x][y][z] that will maintain the lower and upper bounds respectively of the actual probability dp[x][y][z], and basically by initializing lo[x][y][z] to 0, hi[x][y][z] to 1.0 and then updating the arrays in function of each other in a smart way upon multiple iterations, the values "magically" converge to the actual probabilities with enough accuracy and fast enough given the time constraint. I just confirmed this approach works by getting accepted in the problem. However, I still don't understand why it works. So these are my questions:
- Why does this algorithm work? Is there a broader theory that explains it or is this just an adhoc solution?
- Why does this converge fast enough?
- Are there similar problems out there to practice this approach?
Thank you very much.