String Matching Question Help

Правка en2, от chreh, 2025-02-12 19:56:15

Hey everyone,

I don't know if this is the right place to put this, but I made up this problem and am not sure how to solve it. I'm hoping someone with more experience might be able to help.

Setup: I have some code in a coding language. My code consists of N tokens, each associated with a length. I want to minimize my code by adding a macro definition. A macro definition allows me to define a sequence of tokens as a single token with length K, K being the smallest valid unclaimed identifier length. Then, I can replace all occurrences of said sequence of tokens with the defined token.

Now, let's arbitrarily define each distinct token with a distinct nonnegative number. We can then create a mapping W that maps the number associated with each distinct token to the length of that token. Finally, by replacing each token in our source code with its associated number, we get an array A of N nonnegative numbers.

Problem:

Given an array A of N nonnegative numbers (each distinct number corresponds to a type of token in the original code), a map of positive weights W mapping each number in A to some positive weight (this represents the length of the token in the original code), and some positive number K, find a subarray S1, that maximizes its worth. Its worth is defined as (C-1)*(sum_{i=0}^{i=len(S1)}{W[S1[i]]}-K) ((C-1) * (the weighted sum of S1 — K)) where C is the count of how many times S1 appears in A, without overlaps.

Constraints:

1 <= K

1 <= N <= 1e6

Examples: solution([1, 2, 1, 2, 1, 2], {1:1, 2:2}, 1) -> [1, 2]

[1, 2] appears 3 times without overlaps, so C = 3. (3-1)*((1*1+1*2)-1) = 4. It can be shown that for all other subarrays, (C-1)*(sum_{i=0}^{i=len(S1)}{W[S1[i]]}-K) is less

Follow up: How would you optimize your solution if you had to replace S1 with a new nonnegative number with a weight K and then repeat until the most valuable subarray had a worth <= 0? Would this even be possible in a couple of seconds?

Note: I said that this was a string matching problem because I think it could be a string matching problem if you consider the nonnegative numbers to just be characters in an alphabet. After all, there's no inherent meaning to the number associated with a token.

Thanks in advance to anyone that has any idea on how to solve this.

Теги string match, substring search, subarray

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en4 Английский chreh 2025-02-12 20:00:01 11 Forgot to specify that subarrays must be contiguous
en3 Английский chreh 2025-02-12 19:56:32 0 (published)
en2 Английский chreh 2025-02-12 19:56:15 102 fixed formatting (saved to drafts)
en1 Английский chreh 2025-02-12 19:52:03 2330 Initial revision (published)