Hi all! As I've done several times in the past, I've written up an unofficial set of solutions to Codeforces Round 575 so that some solutions will be available before system testing completes. If my code gets hacked, I'll update the appropriate problem's solution as soon as I can.
A — Three Piles of Candies
We claim that the answer is always $$$\lfloor \frac{a+b+c}{2} \rfloor$$$. We clearly can't do any better: if Alice and Bob could each get more candies than this, then in total, they would have more than $$$a+b+c$$$ candies, which is impossible. To prove that this is always doable, let Alice take the smallest pile and Bob take the second-smallest pile. Then, distribute enough candies from the largest pile so that Alice has as many candies as Bob (which is always possible because the largest pile must contain more candies than the difference between the other two piles). Then, distribute the remaining candies from the large pile as evenly as possible.
Runtime: $$$O(1)$$$ per query. Click here for my submission.
B — Odd Sum Segments
Let's first figure out when there is an answer. Notice that the K subarrays must each have a sum of 1 mod 2, so the sum of the entire array must be congruent to K mod 2. Moreover, because every odd subarray must contain at least one odd element, there must be at least K odd elements in the array. If these conditions aren't satisfied, the answer will be No.
We claim that the answer is Yes for all other cases. To construct a solution, we output the indices of the first K-1 odd elements, followed by N. The first K-1 subsequences contain exactly one odd element, so their sum will be odd, and because the sum of the entire array is congruent to K mod 2, the sum of the final subarray is congruent to K-(K-1) = 1 mod 2, so it will have an odd sum, as desired.
Runtime: $$$O(N)$$$ per query. Click here for my submission.
C — Robot Breakout
Notice that for each robot individually, disabling action one means that the resulting $$$X$$$ cannot be lower than the robot's $$$X$$$, because the robot cannot move towards a lower value of $$$X$$$. Disabling action two means that the resulting $$$Y$$$ cannot be higher than the robot's $$$Y$$$, for similar reasons. Actions three and four have similar effects.
We thus maintain the maximum and minimum possible values for $$$X$$$ and $$$Y$$$. Initially, both maxima are $$$100,000$$$ and both minima are $$$-100,000$$$, in order to satisfy the output constraints. Then, as we process each robot, for each disabled action, we update the appropriate maximum or minimum. If the maximum $$$X$$$ is less than the minimum $$$X$$$ or the maximum $$$Y$$$ is less than the minimum $$$Y$$$, the answer is $$$0$$$. If not, we can output any valid point in the given range.
Runtime: $$$O(N)$$$ per query. Click here for my submission.
D — RGB Substring
For the smaller version, D1, we can consider every length-K substring of S individually. For each substring, we take three cases: the substring could start with an R, an G, or a B. Then, the rest of the values of the substring are fixed: any R must be followed by a G, any G must be followed by a B, and any B must be followed by an R. To compute the cost of editing this substring into one of these three cases, we can simply count the number of values in this substring that differ from the intended value at that position. Then, the best answer for this substring is the minimum cost over our three cases, and the best answer overall is the minimum cost over all substrings.
For the larger version, we can apply a very similar technique, using a sliding window approach to optimize the solution. We can modify our cases as follows: essentially, we need our string to have a length-K substring equal to the corresponding substring at the same position in RGBRGBRGB..., GBRGBRGBR..., or BRGBRGBRG.... We consider each of these cases individually. Start by computing the number of "errors" (i.e. positions at which the original string differs from the goal string) in the first K elements. Then, to shift the window, subtract one from the count of errors if the first position in the window causes an error and add one to the count of errors if the position added at the end of the window contains an error. We then simply take the minimum possible number of errors.
Runtime: $$$O(N)$$$ per query (or $$$O(K(N-K))$$$ for the easy version). Click here for my submission.
E — Connected Component on a Chessboard
Without loss of generality, assume $$$B \geq W$$$. (The other case proceeds similarly--in fact, we can simply add one to $$$x$$$ at every point in a solution to this case to get a solution to the case in which $$$B$$$ and $$$W$$$ are swapped.)
The answer is Yes if and only if $$$B \leq 3W+1$$$. We prove this by induction on $$$W$$$. If $$$W = 1$$$, $$$B$$$ can be at most $$$4$$$ by inspection. Then, every new white cell we add can give us at most three new black cells, as the white cell must be adjacent to at least one preexisting black cell. This construction is possible if we simply make a large column, alternating black and white cells, and then add black cells above or below the end white cells and to the left and right of every white cell. To get fewer black cells, we can omit some of the black cells on the sides and/or above and below the ending white cells.
This proof is essentially the key to solving the problem. We can simply implement this procedure to construct a solution. Start by building a column of $$$W$$$ white cells and $$$W-1$$$ black cells. Then, we have space for $$$2W+2$$$ more black cells at the top and bottom and on the sides, so we should simply use as many of those cells as necessary to add the remaining black cells to our component.
Runtime: $$$O(B+W)$$$ per query. Click here for my submission.
F — K-th Path
I'll first present the solution I used in contest. Then, I'll discuss a simpler solution, courtesy of dorijanlendvaj.
Recall that Dijkstra's algorithm visits points in increasing order of their distance from the source node. Therefore, if we wanted to find the K'th shortest path from a single node, we could do so more quickly by running Dijkstra's and terminating as soon as we've visited K other nodes. We exploit a similar property to solve the problem while considering paths from all nodes.
First, we read in the data. For each vertex, we sort the edges coming from that vertex in increasing order of weight. (Ties can be broken arbitrarily.) Then, we define a "state" variable. A state consists of a shortest path from a starting vertex to some other vertex in which we log four variables: the starting node, the second-to-last node on the path, the index of the final edge in the second-to-last node's adjacency list, and the length of the path.
We maintain a priority queue of these states, pulling the least-weight ones first. Start by adding a state for the single-edge paths using each node's shortest edge. Then, to process a state, we first check that the shortest path from our starting node to the ending node has not been found yet. If not, we add this length to our list and output it if it's the K'th one we've found. Then, we add a new state to the priority queue representing the path from our starting node to the ending node, plus the shortest edge of the ending node.
Then, we add a new state in which we replace the last edge of the current state with the next-shortest edge from our second-to-last node. In other words, we increase the index variable in our state by one.
The key reason this method works is that it guarantees that we process all possible states in increasing order of weight. Moreover, we start with $$$N$$$ states, and each state only creates a second extra state if it creates one of our $$$K$$$ paths. Therefore, we'll end up with $$$O(N+K)$$$ states in total.
Due to the priority queue operations and sorting, our runtime is $$$O((N+K) \log (N+K) + M \log M).$$$
Here's a second solution. I haven't submitted this one myself, so please feel free to comment if you spot any errors. (Although I received this solution from Dorijan, I wrote it up myself, so all blame for any errors should go to me.)
Eliminate all edges except the K cheapest ones, with ties broken arbitrarily. Then, run N Dijkstra's iterations or an efficient all-pairs shortest paths algorithm to find all possible lengths of paths. From here, sort the paths and pick the K'th least expensive one. This is correct because none of the paths we're looking for will include an edge other than one of the K cheapest ones. Moreover, it's fairly easy to see that this will run in time: there will be, at most, $$$O(K^2)$$$ paths resulting from this process.
Feel free to post below if you have any questions!
Thank)
In F, I don't undertand why the paths we're looking don't can have other edge other than of the K cheapest ones.
Let X be the maximum length among the K cheapest edges. Any other edge must have length at least X, so any path containing another edge must have length at least X. Meanwhile, we can construct at least K paths with length at most X (simply take the K paths corresponding to the K edges), including all paths with length less than X, so the answer can be found among these paths.
Thanks :)
In first solution of F, Aren't you finding each path twice?
Yes, each path will be considered twice. There is a line $$$k*=2$$$ in his submission.
Missed reading code properly, thanks! Also thanks to Geothermal I haven't seen such use of Dijkstra before.
In the solution to F there is a typo. Instead of: "K'th most expensive one", it should say: "K'th least expensive one".
I'll go fix it; thanks!
In the first solution to F, how do you bound the number of extractions from the priority_queue that don't give you a new shortest distance?
I agree that the size of the priority_queue is O(N+K).
Also, the bound for sorting could be M log min(N,M) instead of M log M because every vertex has at most N neighbours but it doesn't help because M and N have the same upper bound + it's log.
I came up independently with the same solution and it got accepted but I couldn't and still can't figure out what is its complexity. Here is my commented code just if you would like to take a look at it:
https://codeforces.net/contest/1196/submission/57786369
One simple proof is to simply use the second solution to show that the first solution will never use any edges except the $$$K$$$ shortest ones. From here, we can use a similar complexity bound. That doesn't give the complexity given in the solution above, but it does prove that this algorithm will pass in time.
However, this isn't completely satisfying for a few reasons: first of all, most people who implemented the first solution in-contest probably did so because they did not come up with the second solution (this applied to me, at least), and second, the complexity given in the solution extends to larger $$$K$$$.
Some intuition behind the given approach is that each new shortest path will only be able to create a limited number of repeat paths. In particular, the more edges we process, the more repeated visits we might make, meaning that the worst case occurs when we take only paths starting at a single node. In this case, the time complexity is the same as a typical run of Dijkstra's.
This isn't a formal proof, but to confirm my logic (and to ensure that my complexity was correct), I ran my program on some randomly generated test data with $$$N = 10^5, M = 2 \cdot 10^5$$$, and $$$K = 10^5$$$. It terminated in 358 ms on Codeforces' custom invocation tool. (Doubling $$$N, M,$$$ and $$$K$$$ gave a runtime of 842 ms, which seems to support the time complexity given in the solution.) Here's the code I used for this test: https://pastebin.com/9k3NBE0V
Geothermal, just about the first paragraph of this comment: I am not sure if I understand correctly what you mean because our solution might use edges except the K shortest ones for queue pops that don't represent new shortest paths. I do however understand mohamedeltair's explanation on why our algorithm will pass in time. It is great that it doesn't use the observation from the second solution. It is a shame it still doesn't prove that this algorithm will work for K~10^5.
I have written a comment about this: Comment link.
Please feel free to correct me for any mistakes.
Auto comment: topic has been updated by Geothermal (previous revision, new revision, compare).
Just in case you missed it, my question still has not been answered.
If anyone was asking him/herself (like me) about the upper bound of the priority queue pops count that don't represent new shortest paths (in the $$$1^{st}$$$ solution), I think this upper bound can be thought of as follows:
Let's first talk about single source Dijkstra. Assume that we have already found $$$x$$$ shortest paths in the first $$$x$$$ priority queue pops. The upper bound of the priority queue pops count that don't represent new shortest paths before the $$$x+1^{th}$$$ shortest path is found is $$$O(x^2)$$$ (a complete sub-graph of the $$$x$$$ nodes to which the shortest paths are previously found). So if $$$x$$$ is $$$K$$$, the upper bound will be $$$O(K^2)$$$.
Let's now talk about multiple sources Dijkstra. If the number of found shortest paths from source $$$i$$$ is $$$k_i$$$, where $$$\sum_{i=1}^n k_i = K$$$, then the upper bound is $$$\sum_{i=1}^n k_i^2 \leq K^2$$$. So the upper bound is still the upper bound in the single source case ($$$O(K^2)$$$). And since we can never pop more than $$$n*m$$$ entries from the priority queue (going through all $$$m$$$ edges from all $$$n$$$ sources, in which the $$$(n*m)^{th}$$$ pop is the $$$K^{th}$$$ shortest path), a more tight upper bound is $$$O(min(K^2,\, n*m-K))$$$.
A test-case that may clarify my comment:
$$$N\, =\, 100\\ M\, =\, 4950\, \rightarrow \, (\dfrac{N*(N-1)}{2})\\ K\, =\, 4852\, \rightarrow \, (\dfrac{(N-1)*(N-2)}{2} + 1)$$$
Where there is an edge between every pair of nodes. All edges which connect node N to any node have weight $$$10^9$$$, all other edges have weight $$$1$$$. I have modified Geothermal's code to run this test-case and printed the priority queue total pops count, which turned to be $$$960500\, \rightarrow \, (N-1)^2*(N-2)+2$$$.
Link of the modified code (with output): Code with test-case.
Thank you for proving that our solution works because it has total complexity O((N+K^2)log(N+K)+MlogM) without using the observation from the second solution. It is shame that it still doesn't prove that this solution is better than the second solution (using a Dijkstra from every node, not Floyd-Warshall) which has complexity O(K^2logK+MlogM).