pabloskimg's blog

By pabloskimg, history, 14 months ago, In English

I'm trying to understand David Pisinger's balanced algorithm for the subset-sum problem with bounded weights, which can be found on page 5 of his paper Linear Time Algorithms for Knapsack Problems with Bounded Weights (link). Before asking this question I have read the following sources:

I even found this code that implements it for a specific problem: https://atcoder.jp/contests/abc221/submissions/26323758

However, I wasn't able to fully understand the algorithm and its correctness based on those links, so I decided to read the paper itself. Yet, I'm not fully understanding the paper's explanation either. I will paste a screenshot of the algorithm's pseudo-code in the paper and ask specific questions about some sections:

enter image description here

  • Line 3 doesn't make sense to me. A priori we do not know if there exists a subset of items from 1 to b-1 that can exactly add up to the weight μ, for μ in [C+1, C+W]. I understand that assigning 1 means that we are hoping that there must be at least 1 possible assignment without fixing any prefix (1 means no restrictions on the weights), but, as I said, it could be the case that for some μ this is not possible, then Sb-1(μ) should be 0 instead of 1 (just like in line 2). So why is this line okay?

  • Line 7 doesn't seem correct. If we do a balanced insertion of wt, one possible case that could happen is that st-1(μ) == t, that is, all weights from 1 to t-1 are in the subset, and therefore by inserting wt we are extending this to t+1. In other words, the update in line 7 should be something like:

st(μ') = max{ st(μ'), st-1(μ) == t ? t + 1 : st-1(μ) }

Am I missing or misunderstanding something? Why not doing this is okay?

  • I would appreciate further explanations of lines 8 and 9. I'm not clearly seeing why this successfully covers all cases of balanced deletions. How can we know that this amortization optimization is not missing any cases? Can this be proven with induction or some other technique?

Lastly, I have a question about figure 1 on page 6 of the paper:

enter image description here

I highlighted s4(10). According to me, s4(10) should be 3, not 1, because we can achieve a weight of μ = 10 by adding w1 = 6 and w2 = 4 (with total weight of 10), and then we remove weights w3 and w4. Hence the rightmost index <= 4 such that we fix everything before that index and make binary choices from that index onwards would be 3, not 1. Or am I missing or misunderstanding something?

Full text and comments »

  • Vote: I like it
  • +21
  • Vote: I do not like it

By pabloskimg, history, 3 years ago, In English
  • Vote: I like it
  • +88
  • Vote: I do not like it

By pabloskimg, history, 3 years ago, In English
  • Vote: I like it
  • +123
  • Vote: I do not like it

By pabloskimg, history, 3 years ago, In English

For example, if the set of building block strings is {"10", "01", "101"}, the smallest string that can be formed in multiple ways is "10101", since it can be constructed by concatenating "10" + "101" or "101" + "01". Therefore, the answer is 5. However, in the case of the set {"AB", "BA", "ABB"}, there is no string that can be constructed in multiple ways from the building blocks. Therefore, the answer in this case is -1.

How can we solve this problem?

Full text and comments »

  • Vote: I like it
  • +8
  • Vote: I do not like it

By pabloskimg, history, 4 years ago, In English

I'm trying to solve problem Jukebox from ICPC Latin America Regional 2006. You can read the problem statement and submit solutions here. You can also find the official inputs and outputs here and here. Please, read the problem statement to understand all the details, but here is an attempt of a summary:

Basically you are given N songs, each song with a title and an artist. There is at most 30 songs, 6 artists and all strings are at most 30 characters long each. For each song, you can decide if you keep or hide the artist. Your task then is to find the optimal way of keeping/hiding artists such that the sum total of the lengths of the songs' golden strings is minimized. A song's golden string is any shortest substring of either the song's title or artist (if the artist is not hidden) such that it is not a substring of any other song. In other words, when you type a song's golden string in a search box, only that song is matched, and there is no other shorter substring that accomplishes the same. There is always at least one valid solution.

My approach. You can find my code below.

Code

The idea: Since showing two or more times the same artist doesn't help, basically we have two options for each artist: either we hide it completely or reveal it in just one song. Given that there are only 6 artists in the worst case, I thought that backtracking would be good enough to explore all these possibilities. Then, to find out the golden string for each song, I came up with the following idea: we can concatenate all songs and titles in a single string, build a suffix array from the concatenation, and then for each song we can simulate that we do a search by iteratively typing each suffix of the title and each suffix of the artist. As we type a suffix, we can progressively narrow the scope of matches in the suffix array using two binary searches (lower bound and upper bound). We can check if in the current range only a single song is matched by doing a range query with a sparse table that computes the bitwise or over a range (we can encode each song as 1 << song_id, since there is at most 30 songs). This solution is correct, you can check that it gets all the outputs right (see official inputs and outputs at the beginning of this post). Unfortunately, it is too slow :(, because it does backtracking, which in the worst case involves checking 6^6 = 46,656 cases, and for each case I concatenate strings, build a suffix array, build a sparse table, try all suffixes for each song and do binary searches and queries to the sparse stable. So it's just way too slow.

Does anyone have any idea how to do better?

Full text and comments »

  • Vote: I like it
  • +14
  • Vote: I do not like it

By pabloskimg, 4 years ago, In English

There is a line of length $$$T$$$ ($$$1 \leq T \leq 10^8$$$), divided into $$$T$$$ consecutive empty slots of length 1. There are N blocks ($$$1 \leq N \leq 10^5$$$). The i-th block has length $$$x_i$$$ ($$$1 \leq x_i \leq 10^3$$$). The sum of all block lengths is not greater than $$$T$$$. The $$$N$$$ blocks have to be placed over the line. This is done sequentially and randomly, and there is a chance that the process may fail: for the i-th block, we choose one of the $$$T$$$ slots uniformly, and then we find the first empty slot from there to the right. If no empty slot is found, we fail. If we find an empty slot, then we check if we can place the block (we need at least $$$x_i$$$ consecutive empty slots starting from the empty slot we just found). If there is not enough space to place the block, we fail. Otherwise, we place the block, move on to the next and repeat. If we never fail and manage to place all the blocks, we succeed. What is the probability of succeeding? (link to original problem from NAC 2020: https://open.kattis.com/problems/allkill)

Full text and comments »

  • Vote: I like it
  • +16
  • Vote: I do not like it

By pabloskimg, history, 4 years ago, In English

I was trying to solve this problem, which requires computing probabilities of winning/losing in a turn-based two-player game. It's not hard to come up with a recurrent solution that pretty much resembles a DP, except for the fact that the states can repeat down the recurrence, producing cycles. Not knowing how to solve it, I resorted to this comment which describes the expected solution. In short, the idea is to have two arrays lo[x][y][z] and hi[x][y][z] that will maintain the lower and upper bounds respectively of the actual probability dp[x][y][z], and basically by initializing lo[x][y][z] to 0, hi[x][y][z] to 1.0 and then updating the arrays in function of each other in a smart way upon multiple iterations, the values "magically" converge to the actual probabilities with enough accuracy and fast enough given the time constraint. I just confirmed this approach works by getting accepted in the problem. However, I still don't understand why it works. So these are my questions:

  1. Why does this algorithm work? Is there a broader theory that explains it or is this just an adhoc solution?
  2. Why does this converge fast enough?
  3. Are there similar problems out there to practice this approach?

Thank you very much.

Full text and comments »

  • Vote: I like it
  • +14
  • Vote: I do not like it

By pabloskimg, history, 4 years ago, In English

I was trying to solve problem Fygon 2.0 from ICPC 2017–2018, NEERC, Northern Subregional Contest, but I couldn't come up with a solution, so I decided to check out this tutorial. Unfortunately, the tutorial is in Russian and the explanation is quite short and is probably omitting a lot of details. Using google translate I managed to understand that we should build a DAG where nodes represent the variables in the nested for loops and the edges represent inequalities between variables, but I don't understand how you can compute $$$C$$$ and $$$k$$$ from that. Any help is appreciated. Thanks in advance.

Full text and comments »

  • Vote: I like it
  • +5
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

Hi, I'm trying to solve the problem Jeopardized Election. This was problem J in 2018 ICPC Latin America Regional Contest. No team in Latin America was able to solve it during the contest, check out the final scoreboard. I'm trying to figure out the solution but so far I can only think of the brute force approach, which is factorial and can't work. I've got a feeling that there must be a greedy strategy to solve it, but I'm not sure. It would be awesome if some of the genius minds here on Codeforces could shed some light on the solution.

Thank you very much!

Full text and comments »

  • Vote: I like it
  • +14
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

Hi, I'm trying to solve this problem. The problem turned out to be way harder than I expected, and my current solution has many lines of code, I have debugged it a lot with many hand-made test cases but I'm still getting WA. You can find my code here. I looked for a proper implementation of the area of the intersection between a triangle and a convex polygon but I couldn't find any. I would appreciate if someone with expertise in geometry could share his/her knowledge, or if someone could share a link to a proper implementation of this problem that would be awesome too, that way I can compare my solution with it and figure out what I'm doing wrong too.

Thank you very much.

Full text and comments »

  • Vote: I like it
  • +19
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

Hi, I'm trying to solve Gym 102263E Longest Path Problem and my code can be found here. My strategy is to think of each edge as a bridge that splits the tree in 2 subtrees, and the subtree I'm interested in depends on which direction (forward or backward) I'm considering. So, given edge e=(u,v), if I consider the forward direction then I'm interested in the subtree that has v as root (and u is the parent of v), whereas if I consider the backward direction then I'm interested in the subtree that has u as root (and v is the parent of u). Thus, I can use dynamic programming to compute many things for the subtree forward and the subtree backward defined by each edge, the signature of the DP is basically DP(edge, direction) where edge $$$\in [1,N-1]$$$ and direction $$$\in$$$ { forward, backward }. Using this strategy I can recursively compute the diameter, the end nodes of a diameter, the maximum depth and the deepest leaf for both subtrees (forward and backward) associated to each edge in O(N). Once I have that, then the answer for each edge is the maximum between the diameter forward, the diameter backward and the longest path that we obtain by connecting the center of the subtree backward and the center of the subtree forward with the current edge. To find the center of each subtree, I use binary search between the end nodes of a diameter (found in the DP previously explained) using LCA with binary lifting method in the predicate.

I debugged my solution a lot and it's working with the example and I created some additional examples by hand and it's working with them too. However I'm getting wrong answer on test 3. I would really appreciate if somebody could help me out by providing some tricky test cases to test my solution with. Unfortunately gym doesn't show the test cases so I had to create test cases myself but my solution seems to work all the time. Maybe there is something wrong in my strategy somewhere, some tricky corner cases which I'm overlooking.

Thank you very much in advance.


Update: I debugged my solution a little more and realized I had a bug in a function that finds the k-th node in the path from u to v, I fixed that bug but now I'm getting TLE on test case 60 :(, any tips on how to optimize my code even more? You can find my latest code here

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

Given a tree, I need to find all the edges that are shared by all diameters of that tree. In other words, let D be the length of the longest path in the tree and P be the set of all paths in the tree with length = D. I need to find the intersection of the edges shared by all paths in P. How can I do that?

I know how to do it in O(N^3) which is basically to iterate over all pairs of leaf nodes, check if their distance is equal to the diameter (this can be done using LCA) and if it is then traverse the path connecting them adding 1 to a counter associated to each edge along the way, and finally count how many edges have their counters equal to the number of diameters. This should work but it's quite expensive, and for sure there must be a more efficient approach.

Thank you very much.

Update: I need to know how to do this in order to solve this problem: https://www.codechef.com/MARCH13/problems/SUBTREE

Full text and comments »

  • Vote: I like it
  • +1
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

I have a tree, and if I remove an edge the tree would get disconnected into 2 subtrees. Now, for each of these 2 subtrees I would like to find (quickly) 2 end nodes of a diameter, that is, I would like to find the end nodes of a diameter of subtree 1 and the end nodes of a diameter of subtree 2 (4 nodes in total). I need to do this for many possible edges (edge removals don't accumulate). Is it possible to do that efficiently? I need this in order to solve this problem. Otherwise I will need to figure out another approach. Thank you in advance.

Full text and comments »

  • Vote: I like it
  • +10
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

Problem: there are N players, each one starts with some initial score, therefore they appear sorted non-increasingly by their scores in a dynamic leaderboard. From time to time some player changes his score, and the leaderboard gets updated to reflect the player's new ranking (in case of ties, for simplicity let's assume ties are broken by comparing the players' unique IDs). At the same time, we need to answer queries of the form "What is the accumulated sum of the scores of all players with rank between L and R in the leaderboard?". Score updates and range sum queries are interleaved.

Does anybody know how to implement this efficiently? I know I can implement a dynamic ranking using STL policy based data structures, because I can query the current ranking of a player in O(log(N)), remove him in O(log(N)), insert him again with his new score in O(log(N)) and finally query his updated ranking in O(log(N)). However I don't know how to keep track of accumulated sums of ranked scores efficiently. Maybe some clever adaptation of segment tree or fenwick tree might be of help but I'm not sure how.

Any help is very appreciated.


Edit: motivating problem: link

Full text and comments »

  • Vote: I like it
  • +2
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English

There are $$$N$$$ jobs, each job $$$i$$$ has a single prerequisite job $$$P_i$$$ that must be done before, except for a global root job which has no prerequisite. Each job takes $$$T_i$$$ time to be finished, and if a job is finished at time $$$t$$$ it contributes with a penalty of $$$t * U_i$$$, where $$$U_i$$$ is the i-th job's penalty coefficient. What is the minimum penalty for finishing all jobs?

Constraints: $$$N <= 2 * 10^5$$$, everything is integer and non-negative

Notes:

  • only a single task can be performed at a time (there is no concurrent work / task parallelism)
  • the tasks DAG looks like a rooted tree

Any ideas on how to solve this? I was thinking of performing some kind of backtracking over all possible topological orderings of the DAG + some kind of extremely heavy pruning, but I haven't figured out yet a good pruning strategy to avoid exponential time. I also thought of using DP, but then I need to use bitmasks to keep track of unvisited nodes, which leads to exponential time.

Full text and comments »

  • Vote: I like it
  • -33
  • Vote: I do not like it

By pabloskimg, history, 5 years ago, In English
  • Vote: I like it
  • 0
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

As of now, in the root directory of ICPC Live Archive there is no folder named "Regionals 2018". The latest folder is "Regionals 2017". Does anybody know why 2018 regionals have not been uploaded yet and when they will be uploaded? Alternatively, is there any other online judge hosting all 2018 regional problems?

Full text and comments »

  • Vote: I like it
  • +14
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

Problem Statement. I read somewhere else (here) that the linear version of the problem is much easier if you manage to find a split in the circle. I have no idea on how to find such a split, and I have no idea either on why the linear version should be "easy". Any insights will be appreciated.

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English
  • Vote: I like it
  • +10
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

Basically the title. The problem statement can be found here. No idea how to solve it efficiently.

UPDATE 1: why the downvotes?

UPDATE 2: based on the answers so far, the solution would be to exhaustively explore all combinations of exponents k1, k2, ..., kr (ki ≥ kj for all i < j) such that and k = 2k1 * 3k2 * 5k3 * 7k4 * 11k5 * ... < 263. During the exhaustive search one can map each n to its minimum k found (for instance, using an unordered_map) and then the queries can be answered in O(1). The problem is: how can you estimate the complexity of the exhaustive search? Any easy way to estimate an upperbound for it?

UPDATE 3: pshishod2645 says there are around 2 * 106 valid combinations of exponents, where did he get that number from?

Full text and comments »

  • Vote: I like it
  • -24
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

There are at most N = 10^4 strings, each string is at most MAXLEN = 1000 characters long, but the length of the concatenation of all strings is at most 10^6. What would be the more efficient way to build a DAG as described in the title? The naive way would be comparing each pair of strings (X,Y), which leads to O(N^2) comparisons, and then for each pair to check whether X is substring of Y in O(MAXLEN^2). The naive solution could be improved by first sorting strings by length so that each string X can only be substring of strings to the right, and also we could use Rolling Hashing to reduce the complexity of substring search to O(MAXLEN). Is it possible to do even better? I've got the feeling that Suffix Array could be of help, but I'm not sure of exactly how. The motivating problem is this one

Full text and comments »

  • Vote: I like it
  • +15
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

Based on some spoilers I "know" that the problem Game of Tiles from 2012 ICPC Latin America Regional Contest should be solved using maxflow / bipartite matching. However, I still haven't been able to figure out a correct solution for it based on bipartite matching. Does anybody know how to apply bipartite matching, or any algorithm for that matter, to solve the problem? If so, would you kindly share the logic behind your solution?

Full text and comments »

  • Vote: I like it
  • +19
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

I'm trying to solve the problem Routing from ACM ICPC World Final 2006, and I don't have any ideas on how to solve it. Since the number of nodes in the graph is at most 100, I guess a Floyd-Warshall preprocessing should be no problems and could probably help, but after that I really don't have any ideas on what to do next.

Any tips will be appreciated.

Full text and comments »

  • Vote: I like it
  • 0
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

Given a lot of points and circles, you have to find all points that are either inside a circle or surrounded by a closed loop of circles (a sequence of circles of some length N such that circle i intersects circle (i+1)%N). For example, in the picture below, all red points would be points satisfying the aforementioned condition:

I need to figure this out in order to solve this problem. UPDATE: solutions can be submitted here: http://poj.org/problem?id=2863, if you solve it, please let me know which strategy you used :D

I guess I need to build a graph where circles are nodes and add edges between circles intersecting each other. Then I would have to find somehow maximal cycles in this graph, for each maximal cycle find the corresponding maximal polygon formed by connecting the centers of circles in the maximal cycle with segments in counter-clockwise order, and finally for each point check if it's contained by one of these maximal polygons. I've got no idea of how to do anything of this :(

Full text and comments »

  • Vote: I like it
  • +11
  • Vote: I do not like it

By pabloskimg, history, 6 years ago, In English

There are N (<= 2*10^5) 2D points with coordinates x,y and cost c. There are M (<= 2 * 10^4) queries, each query is a point (x,y) with cost c. For each query, you need to return the point with cost <= c that is closest to the query using euclidean distance. In case of ties, return the point that appears first in the input. The full problem statement can be found here: https://icpcarchive.ecs.baylor.edu/external/77/p7744.pdf.

Any ideas on how to solve it? I've got the feeling that some efficient spatial partitioning data structure might be of help, but I'm not sure of which one. For instance one idea I have in mind is to sort both the points and the queries by their costs, and then use 2 pointers, so that one pointer advances through the queries and the other pointer advances through the points, and as I advance through the points I insert each point to some dynamic data structure that would allow me to quickly find the nearest neighbor to the current query (and somehow break ties using the point indexes). Using this strategy, a static data structure such as a kd-tree would not work because the structure would need to be dynamic (support updates). So I just googled dynamic spatial partitioning data structures and for instance I found R* trees, but I'm afraid that learning R* tree might be overkill for competitive programming (?)

Any ideas/hints/suggestions will be appreciated.

Full text and comments »

  • Vote: I like it
  • +8
  • Vote: I do not like it