ComboGirl's blog

By ComboGirl, history, 3 years ago, In English

Here's an interesting string problem I have been thinking about related to the game of Kilordle, a variant of Wordle. I'm curious to see if there is a good solution.

Kilordle is a word game played like Wordle, but instead of having one game, you play several games of wordle simultaneously, and each guess is applied to each individual game. You win after correctly solving every game of wordle. In kilordle, you don't have to guess the exact word for any of the individual games. As long as you guess words that have letters in the correct places that will suffice. You can guess as many times as you want. For example, if the word for one of the individual games is "CARTS", then if you guess "CAtch" and then "foRTS" that individual game will be solved, as you have guessed a word that contains C as the first letter, you've guessed a word that contains A as the second letter, you've guessed a word that contains R as the third letter, etc.

One simple strategy for kilordle is just to type in a sequence of words that covers all possible words in the dictionary.

Formally, suppose the dictionary contains a set $$$S$$$ of $$$n$$$ strings, each of length $$$k,$$$ and there are $$$m$$$ distinct letters in the language. Give an efficient algorithm to find the size of the smallest subset $$$S' \subset S$$$ such that for all $$$s \in S,$$$ and all $$$1 \leq i \leq k,$$$ there exists some $$$s' \in S'$$$ such that $$$s[i] = s'[i].$$$

  • Vote: I like it
  • +16
  • Vote: I do not like it

»
3 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it
  • misread the question
  • »
    »
    3 years ago, # ^ |
    Rev. 4   Vote: I like it 0 Vote: I do not like it

    I'm not sure if that works. The answer you've given is definitely a lower bound. For example, let $$$S$$$ be the set of the following words. $$${CAA, ABC, BCA, BAB}.$$$

    Then it's easy to see that no three of these words can satisfy the conditions for all four words. You need the first word to get the C in CAA (no other word has C as the first letter), you need the second to get the B in ABC, the third to get the C in BCA, and the fourth to get the second B in BAB. So in this case the answer is four and you need all the words, ie. $$$S' = S.$$$

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      OHH you need a subset of S, didn't read that correctly, my bad :)

»
3 years ago, # |
  Vote: I like it +10 Vote: I do not like it

We can show this problem is NP-hard via a reducing Set Cover to this problem. Let's say we have a Set Cover problem with $$$m$$$ sets in a universe of size $$$n$$$.

Then we can generate the following kilordle problem with $$$m + 1$$$ strings of length $$$n + 1$$$ and 2 distinct letters.

First string 0000...0 Other strings For each set $$$S$$$, make the ith bit 1 if $$$i \in S$$$ (and also make the last bit 1)

We must pick the first string as it is the only string ending in 0. We then must pick the minimum number of sets to cover each 1 bit, which is just exactly Set Cover. The minimum answer to the Set Cover problem is then just one less than the answer to the kilordle problem.

»
3 years ago, # |
Rev. 2   Vote: I like it +15 Vote: I do not like it

(edit: better format)

We can formulate the problem as an integer linear programming problem (which is NP-hard, but can be efficiently solved with some solver software.) The list of possible $$$\mathrm{GUESS}$$$'es and possible $$$\mathrm{ANSWER}$$$'s can be seen at this source code, which are called WORDS and WORDLES respectively.

We define an index-letter pair $$$(i,c)$$$ to be present if there is some word $$$s$$$ in the ANSWER list where $$$s[i]=c$$$.

Variables. For each words $$$\mathrm{GUESS}_j$$$ in the possible guess list, we define a binary (0/1) variable $$$x_j$$$.
Objective. We want to minimize $$$\sum_j x_j$$$.
Constraints. For each present index-letter pair $$$(i,c)$$$, we have a constraint $$$\sum_{j\in P(i,c)} x_j \geq 1$$$, where $$$P(i,c)=\{j\mid \mathrm{GUESS}_j [i]=c\}$$$ denotes the set of GUESS words that contains the character $$$c$$$ at its $$$i$$$-th letter.

With the official word list, the ILP has 10,638 variables and 125 constraints. (There are 5 index-letter pairs that are not present in the ANSWER list: {(0, 'x'), (3, 'q'), (4, 'j'), (4, 'q'), (4, 'v')}.) The relaxed LP optima can be calculated, and the optimal value is 29.04, meaning that the ILP has a lower bound of 30. With an MILP solver and some luck, I found the following 30-word solution:

['amnic', 'cwtch', 'enzym', 'fiqhs', 'hajji', 'jambu', 'oflag', 'adsum', 'bravi', 'djinn', 'flexo', 'gyppo', 'ibrik', 'kvell', 'loggy', 'mixte', 'ngwee', 'oxbow', 'pzazz', 'quaff', 'refix', 'schwa', 'skyer', 'squab', 'tsked', 'updry', 'vivda', 'whump', 'ytost', 'zacks']

Fun fact: The spellcheck of my browser marks 14 of these 16 words to be misspelled. I'm surprised to find that "cwtch" and "ytost" are actual words.

Someone on reddit had done this almost a month ago, and they proposed another 30-word solutions.