I couldn’t leave it alone. I just couldn’t. I’ve tried. Believe me, I’ve tried. However, my love of words won’t let me. How can the inhabitants of that most literate of worlds, WordWorld, live knowing that at any moment their collections of letters could detonate into a terrible fate. Knowing of the horrors that are possible and having a copy of the dictionary in electronic format, I’ve modified my code to check each randomly drawn word (of 5+ letters) against it and see what ultimately comes up and how long it takes.
I tend to do most of my development in scripting languages on linux. Unixes include a command-line spell-checker which includes a dictionary of English words. This is usually located in /usr/dict/words, although it is sometimes in /usr/share/dict/words. I copied this file out and stripped it of any “‘s” words using the following sed command:
sed "/'/d" words > newwords
This command just says if there are any lines that contain the single-quote character, delete them. Fast, efficient, and built-in. Take that, Windows!
Obviously, I enjoy the show tremendously — even though I spend most of it on the edge of my seat. Waiting. Waiting for the moment when the terrible accident of misspelling will happen. In order to ease my statistical mind, I whipped up some python to test my theory. How long before something horrible happens. I tried a few words in part 1 and part 2 that would be dangerous and destructive.
import random random.seed() letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' wordnum = 1 all_words = dict() dic = open('words','r') for line in dic: w = line.strip().upper() all_words[w] = 1 while (True): word = '' for v in xrange(5+random.randrange(20)): word += letters[random.randrange(len(letters))] if (word in all_words): print '%d : %s ' % (wordnum,word) wordnum += 1
The results were sometimes fascinating, and sometimes horrible. I tried to imagine what the impact on the world would be, but I will save that for another post. For now, take a look at what got randomly drawn…
Also, please comment if you have suggestions to improve the algorithm. It looks like 5 letter words match most frequently since words with more letters than that are rare and that is the minimum size word I am looking for. I suppose I could drop it to four? Also, nothing really worries me in the beginning of the list. Spook, maybe? A little CIA action? It isn’t until Mimes get drawn that I worry. Mimes? A lot of them? A few? Who knows?
If you want to buy a copy of the first WordWorld DVD, click the linky.
WordWorld: Welcome to WordWorld