When dealing with puzzles - or even with ciphertext - you are sometimes challenged to only know a word by it’s pattern of characters, even thought these are not the concrete characters.

Assume the text ‘why not try cryptography for fun` to be enciphered using a simple mono alphabetic substitution cipher, here with the keyword ‘HELO’.

from pycipher import SimpleSubstitution as SimpleSub
key = 'HELOABCDFGIJKMNPQRSTUVWXYZ'
ct = SimpleSub(key).encipher('why not try cryptography for fun')
print(ct)
'WDYMNTTRYLRYPTNCRHPDYBNRBUM'

Now assume we have some educated guesses, that the word CRYPTOGRAPHY appears somewhere in the text (this is called a ‘crib’) and that we know that this is a mono alphabetic substitution, we can try my collection of function to see if the word’s pattern shows up:

matchPatternInString(ct, 'cryptography')
[9]

So, yes, the pattern of the word ‘cryptography’ shows up at the 10th position in the ciphertext:

WDYMNTTRYLRYPTNCRHPDYBNRBUM
         ^

How do I do it? What I do is I encode every character to an integer representation, starting with 0 - and if I find a new character, it will get the next integer, 1. So HELLO becomes internally [0 1 2 2 3]. After encoding the string to a pattern, I slide over the ciphertext to be searched for the pattern, doing the encoding and comparing it.

The encoding function is simply:

def encodeString(Str):
    """Helper routine for findMatchedWords and matchPatternInString."""
    map = {}
    res = []
    i = 0
    for ch in Str:
        if ch not in map:
            map[ch] = i
            i += 1
        res += [str(map[ch])]
    return res

And the ‘search for pattern in my ciphertext’ is:

def matchPatternInString(ct, pattern, debug=False):
    """Finds a pattern in string ct. Returns list of indexes where found. Empty list, when not found."""
    plen = len(pattern)
    hash = encodeString(pattern)
    o = [ ]
    for index in range(0, len(ct)-plen):
        if encodeString(ct[index:index+plen]) == hash:
            o = o + [ index ]
            if debug:
                print(ct[index:index+plen])
    return o

I even went further; sometimes you are pretty sure that abbcfghh, as the literal ciphertext, is a word, but what could it possibly be? For that I’m reusing the encoding function and compare it against a list of known words.

Example:

findMatchedWords('abbcfghh', en)
['assignee',
 'bootless',
 'foothill',
 'Foosball',
 'football',
 'footless',
 'goofball',
 'coolness',
 'goodness',
 'Goodwill',
 'goodwill',
 'moonless',
 'peekaboo',
 'poorness',
 'roofless',
 'rootless',
 'doorbell',
 'Woodhull']

(en is simply a list of English words as a Python list, compare yesterday’s posting.)

Or, some other fun: Which words has the same pattern as ‘cryptography’?

findMatchedWords('cryptography', en)
['cryptography', 'manipulation']

Now what a coincidence… ;-)

Code:

def findMatchedWords(pattern, dictionary):
    """Finds a pattern in the dictionary. Returns a list of found results."""
    output=[]
    length = len(pattern)
    pattern_hash = encodeString(pattern)
    for word in dictionary:
        if(len(word) == length and encodeString(word) == pattern_hash):
            output += [ word ]
    return(output)