johnkim3104's blog

By johnkim3104, history, 2 years ago, In English

I got this problem from a friend: return the minimum number of states needed to produce a DFA that recognizes a provided language L (link; http://poj.org/problem?id=3576). So I see online there are some algorithms for it like Hopcroft's, but I was trying to think about it a different way that I haven't gotten to work.

So I was thinking about it by starting with an automaton with a start and a sink node and (|w|-1)-long chains for each word w. And so in the first pass we repeatedly merge nodes with the same first character until we can't anymore. And then we repeat the same process from the suffix end of the words. I believe this should be correct because at every step we've done everything we can by a greedy argument. I've been trying to implement this for a while but it doesn't seem to work. This is what I've written so far:

t = int(input())

def remove_prefixes(group, completed):
    compressed = 0
    group.sort()
    i = 0
    while i < len(group):
        j = i - 1
        subgroup = []
        while j + 1 < len(group) and group[j + 1][0] == group[i][0]:
            if len(group[j + 1]) >= 2:
                subgroup.append(group[j + 1][1:])  # remove 1st char
            j += 1
        if j - i == 0:
            completed.append(group[i][1:])
        else:
            compressed += len(subgroup) - 1  # When merging n nodes, we decrease by n-1
            compressed += remove_prefixes(subgroup, completed)  # Continue recursively removing from this group 
        i = j + 1
    return compressed

for test in range(t):
    n = int(input())
    language = [input() for i in range(n)]
    nodes = 2 + sum(len(word) - 1 for word in language)
    completed = []
    nodes -= remove_prefixes(language, completed)
    # Remove the prefixes from the reversed strings
    remaining = [word[::-1] for word in completed if len(word) > 0]  # Exclude empty strings
    nodes -= remove_prefixes(remaining, [])
    print(nodes)

But a cleaner solution my friend showed me was the following:

    n = int(input())
    language = [input() for i in range(n)]
    suffix_sets = set()
    prefix_suffixes = {}
    for word in language:
        for i in range(0, len(word) + 1):
            prefix = word[:i]
            suffix = word[i:]
            if prefix in prefix_suffixes:
                prefix_suffixes[prefix].add(suffix)
            else:
                prefix_suffixes[prefix] = {suffix}
    suffix_sets = set(str(val) for val in prefix_suffixes.values())
    print(len(suffix_sets))

Which I see the idea for, but I'm wondering if my approach could be correct as well. Can anyone suggest their own approaches to the problem or explain if there's a flaw in my reasoning?

Full text and comments »

  • Vote: I like it
  • +8
  • Vote: I do not like it

By johnkim3104, history, 4 years ago, In English

Congrats to Errichto for LGM once again!! Well-deserved for such a valuable contributor to the cf community!

Full text and comments »

  • Vote: I like it
  • -27
  • Vote: I do not like it

By johnkim3104, history, 4 years ago, In English

I’m wondering if it’s possible to solve the problem https://codeforces.net/problemset/problem/1383/A?locale=en with the additional constraint that, instead of being able to select subsets of the same letter, that you can only select substrings (letters that are all adjacent) all of the same letter. What time constraints could this be solved in?

Full text and comments »

  • Vote: I like it
  • +4
  • Vote: I do not like it

By johnkim3104, history, 5 years ago, In English

Hey everyone,

I wanted to share a very general technique that I recently noticed (it's more of a strategy, I'm not sure what to call it). Maybe someone can help me clarify this and offer other problems related to it. Here are the two problems I'll be discussing: problem 1 problem 2.

In the first problem, we have a 2D problem where N = 100,000, so this means we can't do a simple O(N^2) DP. The correct solution is to sweep over springboards sorted by x coordinates in linear time and maintain a monotonic set of optimum values for different y-values (basically, we must observe that, if we have two springboards i, j, where y[i] < y[j], then we will never use springboard j if the distance we can skip over up to point i is less than that for point j). You can check out the solution for more details, but the main point I'm trying to explore is that this solution required you to break the symmetry.

What I mean by this is that, rather than doing a DP[x][y] which would be a sort of shortest-path solution, we needed to just sweep over x in linear time, and then turn our attention to dealing with the y-component. It's almost like viewing this in 2D is a red herring, since the x and y aspects aren't used in the same way. In summary, we made the problem simple in one aspect (we swept over the x-axis), and then we found a way to work around the difficulty in the second aspect with some type of optimization (a set with an invariant).

In the second problem, which is quite different, we might think about transitions from [A, B] to [C, D] which seems difficult to do and likely O(N^2) in order to find all [A, B] initial ranges from which to transition via some rope. However, we can break the symmetry again here by simplifying the [A, B] part (the state we're transitioning from).

What we can do is just make DP[i][l] store the minimum cost to cover point l using the first i ropes. By doing this, we now sweep over l in linear time, and then we turn our attention to dealing with the [C, D] part. Now, it's relatively clear that we should use a segment tree, since this corresponds to just performing a range update for each l position and the ith rope. Again, what we did was to try to make one part of this problem easy (finding a state to transition from), then figure out how to do the rest of the problem with some optimization.

Please let me know if you have ideas on when this technique can be applied (what conditions of the above problems made this possible?) and other example problems if you have seen something similar before. Let me know if I've made any mistakes. Thanks!

Full text and comments »

  • Vote: I like it
  • +8
  • Vote: I do not like it