Word Chain
Word chain is a game where players take turns coming up with a thematic word that starts with the last letter of the previous word in the game. While playing a country-themed round, I wondered how long a game could take, so I wrote a program to try to solve it.
While I wrote the code to allow for any list of characters as long as the first and last letters are a-z, I was primarily working with a list of 251 countries.
Modes
I took 3 approaches to try to find the longest chain of words given an input set.
Follow the Max (--mode max)
The first approach is a greedy approach to always choose the word that has the most words following it. For example, the decision between choosing ‘cat’ and ‘cap’ is determined by the number of available words that start with ’t' and ‘p’. While this solution is very fast, it only gets the optimal solution for specific inputs. With this method, I got a solution of 51 countries.
Random Traversal (--mode random)
This approach is randomly inserting valid words until there is a dead end and repeating the process a certain number of iterations. While it is random, each run is repeatable because the seed is printed at the start of the program. This method is the best for a quick solution that gets a relatively optimal solution. After running this method for a billion iterations, which took about a day on my machine, I was able to get a solution of 63 words, which is a huge improvement over the max method for this set of words.
Complete Traversal (--mode brute)
The last and most direct solution is the brute force solution, which traverses every possible chain of words and records the maximum. While this approach is guaranteed to give the maximum list of words, it’s not feasible to run completely on large inputs. As of the time of writing this, the maximum list of words is 64 words, which was found after about 2 days of running the brute force method. Surprisingly, the brute force method finds a list of 62 words almost immediately after starting the program, probably because of the characteristics of this word set.
Lessons
- I learned how to catch an interrupt signal in C++. However, I was unable to find an elegant solution for my purposes that does not require the use of a global variable.
- I wanted to run the brute force mode without using up 100% of one of the cores of my dual core server, so I used cpulimit to throttle the process when it goes above a certain limit.
Technical Limitations
There are a few things that I plan to address in future versions of wordchain:
- Support for non a-z sets — Currently, the words must start and end with a-z and it otherwise segfaults. Because of this, I plan to improve the program by building the chain based on the available starting and ending letters. At the very least, I should add an informative error message rather than a “Segmentation Fault”.
- Recover from state — Since the complete traversal method takes so long, I plan to add a feature to allow recovery from crash or even allowing the brute force method to start from a specific sequence. The input to this method would simply be a list of words which the program would start from.
- Unified code base — Over the course of writing wordchain, I was continuously changing the design. As such, I have several redundant data structures being used concurrently as well as disorganized public functions. To make this program ready for public use, the code base needs to be refactored.
- Efficiency — Because of the several overlapping data structures in use, there are many points where wordchain is much slower than it should be. While the algorithmic complexity shouldn’t change, there are many constants that can be greatly reduced.