[#strings #ICPC #LATAM #2006] Jukebox — can we do better than backtracking + suffix array + sparse table + binsearch?

Revision en4, by pabloskimg, 2021-04-21 15:42:36

I'm trying solve problem Jukebox from ICPC Latin America Regional 2006. You can read the problem statement and submit solutions here. Please, read the problem statement to understand all the details, but here is a summary:

Basically you are given N songs, each song with a title and an artist. There is at most 30 songs, 6 artists and the strings are at most 30 characters long. For each song, you can decide if you keep or hide the artist. The task is to find the optimal way of keeping/hiding artists such that the sum total of the lengths of the songs' golden strings is minimized. A song's golden string is any shortest substring of either the song's title or artist (if the artist is not hidden) such that it is not substring of any other song. In other words, when you type a song's golden string, only that song is matched. There is always at least one valid solution.

My approach. You can find my code below.

Code

The idea: Since showing two or more times the same artist doesn't help, basically we have two options for each artist: hide it completely or reveal it just once. So I thought that backtracking would be good enough to explore all these possibilities since there is at most 6 artists. Then, to find out the golden string for each song, I came up with the following idea: we can concatenate all songs and titles in a single string, build a suffix array from the concatenation, and then for each song we can simulate that we do a search by iteratively typing each suffix of the title and each suffix of the artist. As we type a suffix, we can progressively narrow the scope of matches in the suffix array using two binary searches (lower bound and upper bound). We can check if in the current range only a single song is matched by doing a range query with a sparse table that computes the bitwise or over a range (we can encode each song as 1 << index, since there is at most 30 songs). This solution is correct, you can check that it gets all the outputs right: official input, official output. Unfortunately, it is too slow :(, because it does backtracking, which in the worst case involves checking like 6^6 = 46,656 cases, and for each I concatenate strings, build a suffix array, build a sparse table, try all suffixes for each song and do binary searches and queries to the sparse stable. So it's just way too slow.

Does anyone have any idea how to do better?

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en11 English pabloskimg 2021-04-22 07:38:46 1 Tiny change: 'at the begginning of' -> 'at the beginning of'
en10 English pabloskimg 2021-04-21 21:43:47 3 Tiny change: ''m trying solve pro' -> ''m trying to solve pro'
en9 English pabloskimg 2021-04-21 16:25:38 5 Tiny change: 'hat there is only 6 ar' -> 'hat there are only 6 ar'
en8 English pabloskimg 2021-04-21 15:59:18 3 Tiny change: 'bstring the accomplis' -> 'bstring that accomplis'
en7 English pabloskimg 2021-04-21 15:57:04 0 (published)
en6 English pabloskimg 2021-04-21 15:53:30 562
en5 English pabloskimg 2021-04-21 15:47:21 464
en4 English pabloskimg 2021-04-21 15:42:36 1123
en3 English pabloskimg 2021-04-21 15:32:26 578 Tiny change: 'a summary: basically y' -> 'a summary:\n\nBasically y'
en2 English pabloskimg 2021-04-21 15:13:08 940
en1 English pabloskimg 2021-04-21 14:53:44 6123 Initial revision (saved to drafts)