Finding word patterns
When dealing with puzzles - or even with ciphertext - you are sometimes challenged to only know a word by it’s pattern of characters, even thought these are not the concrete characters.
Assume the text ‘why not try cryptography for fun` to be enciphered using a simple mono alphabetic substitution cipher, here with the keyword ‘HELO’.
from pycipher import SimpleSubstitution as SimpleSub
key = 'HELOABCDFGIJKMNPQRSTUVWXYZ'
ct = SimpleSub(key).encipher('why not try cryptography for fun')
print(ct)
'WDYMNTTRYLRYPTNCRHPDYBNRBUM'
Now assume we have some educated guesses, that the word CRYPTOGRAPHY
appears
somewhere in the text (this is called a ‘crib’) and that we know that this is
a mono alphabetic substitution, we can try my collection of function to see if
the word’s pattern shows up:
matchPatternInString(ct, 'cryptography')
[9]
So, yes, the pattern of the word ‘cryptography’ shows up at the 10th position in the ciphertext:
WDYMNTTRYLRYPTNCRHPDYBNRBUM
^
How do I do it? What I do is I encode every character to an integer
representation, starting with 0
- and if I find a new character, it will get
the next integer, 1. So HELLO
becomes internally [0 1 2 2 3]
. After
encoding the string to a pattern, I slide over the ciphertext to be searched
for the pattern, doing the encoding and comparing it.
The encoding function is simply:
def encodeString(Str):
"""Helper routine for findMatchedWords and matchPatternInString."""
map = {}
res = []
i = 0
for ch in Str:
if ch not in map:
map[ch] = i
i += 1
res += [str(map[ch])]
return res
And the ‘search for pattern in my ciphertext’ is:
def matchPatternInString(ct, pattern, debug=False):
"""Finds a pattern in string ct. Returns list of indexes where found. Empty list, when not found."""
plen = len(pattern)
hash = encodeString(pattern)
o = [ ]
for index in range(0, len(ct)-plen):
if encodeString(ct[index:index+plen]) == hash:
o = o + [ index ]
if debug:
print(ct[index:index+plen])
return o
I even went further; sometimes you are pretty sure that abbcfghh
, as the literal ciphertext, is a word, but what could it possibly be? For that I’m reusing the encoding function and compare it against a list of known words.
Example:
findMatchedWords('abbcfghh', en)
['assignee',
'bootless',
'foothill',
'Foosball',
'football',
'footless',
'goofball',
'coolness',
'goodness',
'Goodwill',
'goodwill',
'moonless',
'peekaboo',
'poorness',
'roofless',
'rootless',
'doorbell',
'Woodhull']
(en
is simply a list of English words as a Python list, compare yesterday’s
posting.)
Or, some other fun: Which words has the same pattern as ‘cryptography’?
findMatchedWords('cryptography', en)
['cryptography', 'manipulation']
Now what a coincidence… ;-)
Code:
def findMatchedWords(pattern, dictionary):
"""Finds a pattern in the dictionary. Returns a list of found results."""
output=[]
length = len(pattern)
pattern_hash = encodeString(pattern)
for word in dictionary:
if(len(word) == length and encodeString(word) == pattern_hash):
output += [ word ]
return(output)