Faster string comparison with a trie

Thursday, June 13, 2024

javascripttypescript

Lately I’ve been playing a game in the Apple News app called Quartiles. The game presents 5 rows of 4 scrambled word parts, and the goal is to arrange one to four of the parts to assemble words, scoring points for longer words.

Each puzzle includes around 25-30 words, and once you reach 100 points the game congratulates you. But I found myself obsessively trying to find every single word, and then I got the idea to build a solver.

First, build a form.

Then figure out how many possible combinations there are.

To find all the possible combinations of those word parts, in groups of 1 to 4, I came up with a findCombinations function, with a little help from Copilot. Try typing some unique word parts into the form and see the number of possible words that can be made from those parts. Or with today’s puzzle. Go on, I’ll wait.

The rest of this article won’t really make sense until you with 20 unique word parts.

Next, just compare those combinations with a word list.

Simple! Or not.

First, I downloaded the scrabble-dictionary.txt from some Github repo. It contains 178,692 words. So I naively thought I could just loop through my combinations and filter them using the dictionary:

const words = combinations.filter(
  combination => dictionary.includes(word)
);

I was hoping to show the matching words in real-time as I fill out the form, but when I ran that filter with more than just a few word parts it basically crashed my browser.

So I did a bit of research into running large javascript tasks without blocking the main thread. The most common suggestion is to use a web worker, but I also stumbled across (that is, found on Stack Overflow) an iterative solution that breaks the job up into separate tasks (e.g. one task for each combination) and calls each one with a setTimeout.

The rest of this article won’t really make sense until you with 20 unique word parts.

Iterator Demo

You can try different chunk sizes to see how it performs. Larger chunk sizes seem to make the overall job go faster, but the UI gets kinda janky.

This technique keeps the browser responsive but is incredibly slow. The use of so many setTimeout calls probably causes the task to take much longer than the thread-blocking browser-crashing version. Regardless, this was clearly no way to provide real-time feedback on the number of found words as the form is filled out. A web worker implementation might be faster, but setting that up is a bit involved so I decided to explore other options first.

I thought maybe I should find a shorter word list. The Scrabble dictionary contains a lot of words, many of which nobody ever uses, let’s be honest. So I found an npm package called is-word that has a few different dictionaries, one of which is a list of American English words. Filtering out all the proper nouns in that list yielded 60,504 words. Still way too many to make a dent in the processing time.

While poking around in the source of that package, I noticed it was using a Trie construct for checking words. I never properly studied computer science, so while I had heard of tries (trees? try’s? triés?), I had never used them, or had a need to. So as a learning exercise I cribbed from there and ported the Trie idea into typescript.

To use it, you first loop through your word list and insert each word into the trie:

const trie = new Trie();
americanEnglishDictionary.forEach(word => trie.insert(word));

Then you can call trie.check(possibleWord) to see if there’s a match. So let’s try that with our 0 combinations.

The rest of this article won’t really make sense until you with 20 unique word parts.

Trie Demo

pascal’s diary · copyright about now · rss