Finding Anagrams from a list of words in Python

I’m kind of obsessed with historic cryptography and puzzles. A week ago or so I had to find anagrams for a given word and although you could use your favorite search engine to look up an existing list for a given language - or even fancier, using ChatGPT - I decided to cook it up my own.

First, an anagram isn’t just a simple random permutation, it must also be a proper word existing in the language. While one could simply do something like

In [21]: iword = list("hello")
In [22]: shuffle(iword)
In [23]: ''.join(iword)
Out[23]: 'ollhe'

this isn’t exactly helpful.

So what I’m doing instead is reading in a list of words into a list of strings, then sort the word I want to find anagrams for by the ASCII-value of each individual characters and then look for words in the list matching the same pattern. Example:

In [25]: [ord(c) for c in 'hello']
Out[25]: [104, 101, 108, 108, 111]
In [29]: o = [ord(c) for c in 'hello']
In [30]: o.sort(); o
Out[30]: [101, 104, 108, 108, 111]

IF an anagram exists, then there should be at least two words in the list, which follow the same pattern. To accommodate for upper-/lowercase characters, I make all characters lowercase first.

So, first, read in a list of words - with one word per line - and put it into a list:

In [6]: en = []
In [7]: with open("/home/alex/share/wordlists/english.txt") as f:
            while True:
                line=f.readline()
                if not line:
                    break
                else:
                    en += [ line.strip('\n' ]
In [8]: en[0:5]
Out[8]: ['W', 'w', 'WW', 'WWW', 'WY']

Alrighty… Now the fun:

def findAnagram(word, wl):
    """Find an anagram for word in wordlist wl.
    wl must be python list of words (as strings).
    A wordlist can be generated by reading a flat text file containing words,
    e.g. by using tthe helper function gen_wordlist_list_from_file().
    """
    # The idea is to grab all words of the same length, then sort
    # the characters and get an ascii representation; then find all
    # which have the same representation.
    word = word.lower()
    tmp_wl = [i for i in wl if len(i) == len(word)]
    enc_word = [ord(i) for i in word]
    enc_word.sort()
    out = []
    for i in tmp_wl:
        i = i.lower()
        t = [ord(x) for x in i]
        t.sort()
        if enc_word == t:
            out += [ i ]
    return out

Let’s try this!

In [16]: [findAnagram(word, en) for word in "How does this even work".split("
")]
Out[16]: 
[['how', 'who', 'who'],
 ['odes', 'does', 'dose'],
 ['this', 'hist', 'hits', 'shit'],
 ['even'],
 ['work']]

Fun!