Finding Anagrams from a list of words in Python
I’m kind of obsessed with historic cryptography and puzzles. A week ago or so I had to find anagrams for a given word and although you could use your favorite search engine to look up an existing list for a given language - or even fancier, using ChatGPT - I decided to cook it up my own.
First, an anagram isn’t just a simple random permutation, it must also be a proper word existing in the language. While one could simply do something like
In [21]: iword = list("hello")
In [22]: shuffle(iword)
In [23]: ''.join(iword)
Out[23]: 'ollhe'
this isn’t exactly helpful.
So what I’m doing instead is reading in a list of words into a list of strings, then sort the word I want to find anagrams for by the ASCII-value of each individual characters and then look for words in the list matching the same pattern. Example:
In [25]: [ord(c) for c in 'hello']
Out[25]: [104, 101, 108, 108, 111]
In [29]: o = [ord(c) for c in 'hello']
In [30]: o.sort(); o
Out[30]: [101, 104, 108, 108, 111]
IF an anagram exists, then there should be at least two words in the list, which follow the same pattern. To accommodate for upper-/lowercase characters, I make all characters lowercase first.
So, first, read in a list of words - with one word per line - and put it into a list:
In [6]: en = []
In [7]: with open("/home/alex/share/wordlists/english.txt") as f:
while True:
line=f.readline()
if not line:
break
else:
en += [ line.strip('\n' ]
In [8]: en[0:5]
Out[8]: ['W', 'w', 'WW', 'WWW', 'WY']
Alrighty… Now the fun:
def findAnagram(word, wl):
"""Find an anagram for word in wordlist wl.
wl must be python list of words (as strings).
A wordlist can be generated by reading a flat text file containing words,
e.g. by using tthe helper function gen_wordlist_list_from_file().
"""
# The idea is to grab all words of the same length, then sort
# the characters and get an ascii representation; then find all
# which have the same representation.
word = word.lower()
tmp_wl = [i for i in wl if len(i) == len(word)]
enc_word = [ord(i) for i in word]
enc_word.sort()
out = []
for i in tmp_wl:
i = i.lower()
t = [ord(x) for x in i]
t.sort()
if enc_word == t:
out += [ i ]
return out
Let’s try this!
In [16]: [findAnagram(word, en) for word in "How does this even work".split("
")]
Out[16]:
[['how', 'who', 'who'],
['odes', 'does', 'dose'],
['this', 'hist', 'hits', 'shit'],
['even'],
['work']]
Fun!