Computer analysis, online translator, intelligent guesses crack ancient German code
The encrypted initiation rite of an ancient German secret society preoccupied with ophthalmology has fallen to modern language analysis tools including an online automatic translator.
The 19th Century document known as the Copiale Cipher turned out to be just that - a substitution cipher - although a complex one.
The book, named for a plaintext word found in the otherwise enciphered work, was discovered in East Germany after the Cold War. Analysis of the 75,000-character document was carried out by a team of U.S. and Swedish information scientists and linguists led by Kevin Knight of the University of Southern California's Information Sciences Institute.
SLIDESHOW: The encryption quizÂ
SLIDESHOW: The history of steganography
Using a machine-readable transcript of the first 16 pages of the 105-page handwritten book, they performed statistical analysis of the frequency with which each of the 90 different characters appears and of the characters that precede and follow each character.
They also analyzed the frequency of recurrent character pairs and groupings of three.
The characters consisted of the Roman alphabet, some Greek letters and the rest unique symbols. Some of the Roman letters had variants with a dot over them or umlauts (two dots) or circumflexes (^).
Based on clustering characters that were preceded and followed by similar groups of letters, the researchers found that the Roman letters seemed to fall into a single large cluster. On a hunch, they assumed that since this single alphabet fell into a single group then it carried the meaning of the text and that all the other characters were there for show - to mislead attempts to decipher.
They attacked the transcript with automatic computer attacks that sought to make sense of the jumble in 42 different languages, with German, English and Latin being given preference. It didn't pan out.
So they revised their hunch to consider a homomorphic substitution in which a single letter of the plaintext message can be replaced by more than one character. So, for example, the letter "e" might be replaced by any of the characters "s", "2" or "@".
This helps hide the frequency with which plaintext characters appear, making it more difficult to decipher them. So imagine a letter that accounts for 12% of the characters in a message. If it is represented in the cipher by either of two replacement characters, seeking a single ciphertext character that accounts for 12% of the total characters won't reveal the plaintext letter.
An automatic computer attack that assumed a homomorphic cypher failed to decrypt the text, but it did indicate numerically that German might be the underlying language. Given that the text was found in Germany and that it ends with the plaintext "Philipp 1866" using the German spelling with two ps, they focused on German as the most likely language.
