WordWhirled (Part 3)

A frog

Lexical Terrorist?

I couldn’t leave it alone. I just couldn’t. I’ve tried. Believe me, I’ve tried. However, my love of words won’t let me. How can the inhabitants of that most literate of worlds, WordWorld, live knowing that at any moment their collections of letters could detonate into a terrible fate. Knowing of the horrors that are possible and having a copy of the dictionary in electronic format, I’ve modified my code to check each randomly drawn word (of 5+ letters) against it and see what ultimately comes up and how long it takes.

I tend to do most of my development in scripting languages on linux. Unixes include a command-line spell-checker which includes a dictionary of English words. This is usually located in /usr/dict/words, although it is sometimes in /usr/share/dict/words. I copied this file out and stripped it of any “‘s” words using the following sed command:

sed "/'/d" words > newwords

This command just says if there are any lines that contain the single-quote character, delete them. Fast, efficient, and built-in. Take that, Windows!

Obviously, I enjoy the show tremendously — even though I spend most of it on the edge of my seat. Waiting. Waiting for the moment when the terrible accident of misspelling will happen. In order to ease my statistical mind, I whipped up some python to test my theory. How long before something horrible happens. I tried a few words in part 1 and part 2 that would be dangerous and destructive.

import random

random.seed()

letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

wordnum = 1

all_words = dict()

dic = open('words','r')
for line in dic:
    w = line.strip().upper()
    all_words[w] = 1

while (True):
    word = ''
    for v in xrange(5+random.randrange(20)):
        word += letters[random.randrange(len(letters))]

    if (word in all_words):
        print '%d : %s ' % (wordnum,word)

    wordnum += 1

The results were sometimes fascinating, and sometimes horrible. I tried to imagine what the impact on the world would be, but I will save that for another post. For now, take a look at what got randomly drawn…

Draw # Word
19073 DOWRY
74519 RUSTY
94540 SPOOK
171583 MUSED
217064 PASTY
249513 PLANS
328137 EBERT
333947 THYME
393961 WATCH
430494 PROSE
496566 ACRID
512592 JAMEL
520863 IDEAL
553565 HARSH
592815 GHOUL
711715 SHOOK
771266 CARPI
775606 AHMED
855580 DRAWS
862455 WHACKY
864193 DECKS
973924 HORTHY
1052544 SMALL
1168031 LEANS
1174348 MAJOR
1275189 ROWDY
1302818 LODES
1309876 EATON
1420125 PAWNS
1422872 SOAPS
1433129 SKULK
1558703 ENFOLD
1586145 MIMES
1624994 UNFIT
1636833 ANGLE
1697635 MOOTS
1762345 MIKED
1768921 THEIR
1809887 AISHA
1899768 SENDS
1903990 AHMED
1942276 TONIA
1981814 GLUED
1984069 AROSE
1997689 GOODY
2012236 FINKS
2102217 TAPER
2117755 BUNCH
2118591 BURCH
2219317 PICKS
2254772 EULER
2320340 MINTY
2343601 BLURS
2363262 RASPY
2374850 LONER
2490701 SCALD
2493451 BEGET

Also, please comment if you have suggestions to improve the algorithm. It looks like 5 letter words match most frequently since words with more letters than that are rare and that is the minimum size word I am looking for. I suppose I could drop it to four? Also, nothing really worries me in the beginning of the list. Spook, maybe? A little CIA action? It isn’t until Mimes get drawn that I worry. Mimes? A lot of them? A few? Who knows?

If you want to buy a copy of the first WordWorld DVD, click the linky.
WordWorld: Welcome to WordWorld

Related Posts:

This entry was posted in Cartoon Sociology, Programming, Security and tagged , , , , , . Bookmark the permalink.

Comments are closed.