Mini-Challenge #6: Answer

Here are the decrypted messages:

1. This is a Caesar cipher. Now use a frequency analysis to decode the well-known first paragraph, and look for a secret message at the end.

2. It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters. Use the digits of pi to shift the letters in the next message forwards.

(NB. Aside from the last sentence, this is the opening paragraphs of Pride and Prejudice by Jane Austen).

3. theansweris...
  
  (a)  theanswerisalanmathisonturingAlan Turing was a British mathematician, logician, cryptanalyst, and computer scientist, who during the Second World War worked in code-breaking at Bletchley Park.

  (b)  theanswerismavisbateyMavis Batey and her husband Keith were codebreakers at Bletchley Park. She has written various books about this eventful period.

  (c)  thenewanswerisedwardwilliamelgarEdward Elgar was a British composer who wrote the Enigma variations at the end of the 19th century. An Enigma machine is any of a family of related electro-mechanical rotor cipher machines used for the encryption and decryption of secret messages in the 20th century, particularly during WWII.


Message 1


The first message is encrypted using a Caesar cipher (or shift cipher). To decrypt, we need to "shift" all the letters along in the alphabet by the same amount, "wrapping around" when we reach end of the alphabet. For example, a forward-shift by one unit would mean a -> b, b -> c, ..., z -> a.

 We can find the shift value by trial and error, or by looking closely at the encrypted text:

Uijt jt b Dbftbs djqifs. Opx vtf b gsfrvfodz bobmztjt up efdpef uif xfmm-lopxo gjstu qbsbhsbqi, boe mppl gps b tfdsfu nfttbhf bu uif foe.

Note that the only one-letter word that appears is "b".  In the English language, there are only two one-letter words: "a" and "I". Since this letter is not capitalized, it is likely to be an "a". So it makes sense to try shifting all the letters in the message backwards by one.  And this is what we get:

This is a Caesar cipher. Now use a frequency analysis to decode the well-known first paragraph, and look for a secret message at the end.

Here is some Python code to carry out a Caesar shift:


text = "Uijt jt b Dbftbs djqifs. Opx vtf b gsfrvfodz bobmztjt up efdpef uif xfmm-lopxo gjstu qbsbhsbqi, boe mppl gps b tfdsfu nfttbhf bu uif foe."
text = text.lower()  # convert to lower case
decrypt = list(text)
alphabet = "abcdefghijklmnopqrstuvwxyz"
def apply_shift(shift, text):
    for i in range(len(text)):
        letter = text[i]
        if letter in alphabet:
            decrypt[i] = alphabet[(alphabet.find(letter)+shift)%26]
    print ''.join(decrypt)
apply_shift(-1, text)



Message 2


An alternative explanation of how to decrypt message 2 has been posted by Mike Codes Things - thanks Mike!

The first message gives us a clue for decrypting the second message. "Now use a frequency analysis to decode the well-known first paragraph, and look for a secret message at the end."

What is a frequency analysis? "In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers."

The text in message 2 has been encrypted with a simple substitution cipher. In other words, each letter in the plaintext is substituted by another letter, according to a regular rule (e.g. a -> q, b -> g, c -> x, etc).

Certain letters occur more frequently than other letters in the English language. For example, "e" is much more common than "z".  So we can count the number of occurrences of each letter in the code, to get a good clue as to the letter they represent.

Here is the message:

Ts tg d swjsu jvtkhwgdmme dfovbpmhnlhn, suds d gtvlmh cdv tv qbgghggtbv ba d lbbn abwsjvh cjgs ih tv pdvs ba d ptah. Ubphkhw mtssmh ovbpv suh ahhmtvlg bw kthpg ba gjfu d cdv cde ih bv utg atwgs hvshwtvl d vhtluibjwubbn, sutg swjsu tg gb phmm atrhn tv suh ctvng ba suh gjwwbjvntvl adctmthg, suds uh tg fbvgtnhwhn dg suh wtlusajm qwbqhwse ba gbch bvh bw bsuhw ba suhtw ndjlushwg. Jgh suh ntltsg ba qt sb gutas suh mhsshwg tv suh vhrs chggdlh abwpdwng.

A frequency analysis of the encrypted text finds the following:
  • The most common letter is "h", representing ~12% of the letters
  • The second most common letter is "s", at 9.75%
  • The third most common letter is "t", at 8.91%
  • etc.

See here for the "average" letter frequencies in English text. On average, "e" is most common, at 12.7%. "t" and "a" are next most common, at 9.06% and 8.17%.  So this suggests that "e" has been replaced by "h" in the text, and "s" and "t" may represent "t" and "a", or possibly "o" or "s" or other common letters.

Short words

We can make more progress by looking at the occurrence of short words. For example, the only 1-letter word appearing in the encrypted text is "d", which must surely represent "a". There are 7 occurrences of the 2-letter word "ba". The most common two-letter word in English is "of", followed by "to" and "is".  And there are 7 occurences of "suh", and 2 of "cdv". The most common three-letter words in English is "the", followed by "and". Most likely "suh" is "the". However, "cdv" can't be "and" because we already know that "d" represents "a". So "cdv" is a three-letter word with "a" as the central letter, possibly "can", "has", "had", "day", or "man". 

Putting all this together, it suggests that:

  • "d" is actually "a"
  • "h" is actually "e"
  • "s" is "t" and "u" is "h"
  • "b" is "o" and "a" is "f"
If we try making these substitutions in the encrypted text we get:

---------
*t ** a t**th ****e**a*** a***o**e**e*, that a *****e *a* ** *o**e***o* of a *oo* fo*t**e ***t *e ** *a*t of a **fe. ho*e*e* **tt*e **o** the fee***** o* **e** of ***h a *a* *a* *e o* h** f***t e*te**** a *e**h*o**hoo*, th** t**th ** *o *e** f**e* ** the ***** of the ****o****** fa****e*, that he ** *o****e*e* a* the ***htf** **o*e*t* of *o*e o*e o* othe* of the** *a**hte**. **e the ****t* of ** to *h*ft the *ette** ** the *e*t *e**a*e fo**a***.
---------

Those with a good knowledge of English literature, or those who enjoy crosswords, may already be able to guess the opening paragraph at this point.

Let's look again at the opening "Ts tg d", which we have partially decrypted: *t ** a.  Note that "t" appears twice in the encrypted text. "t" must surely represent a vowel, and it cannot be "a". It must be "i", which gives us "It i* a". This suggests that "g" represents "s", so the text is "It is a".

We can proceed in this way, by a process of deduction and elimination, until the substitution cipher is completely cracked. In summary,

To decrypt the message, replace the letters
    abcdefghijklmnopqrstuvw
with
    fomaycsebuvgldkwpxtihnr

(Note that xyz do not appear in the message).

Python code for assisting with the frequency analysis can be found attached at the bottom of this page.
For more on frequency analysis, try this page.



Message 3

The last sentence of the second message tells us how to decrypt the final message:

    Use the digits of pi to shift the letters in the next message forwards.

In other words, we will try shifting the letters forward in the alphabet by "3", "1", "4", "1", "5", "9" units, etc., remembering to "wrap around" when reaching the end of the alphabet.  So if the encrypted text is "qgazijuymfnsctejyqzemmhproake", we will get:

    theansw ...

which looks like a promising start.  Here is some Python 2.7 code to carry out the shift:


pidigits = "31415926535897932384626433832795028841971"
alphabet = "abcdefghijklmnopqrstuvwxyz"
text = "qgazijuymfnsctejyqzemmhproake"
msg = [alphabet[(alphabet.find(text[i]) + int(pidigits[i]))%26] for i in range(len(text))]
print ''.join(msg)

which should give you:

theanswerisalanmathisonturing

Alan Turing worked on code-breaking at Bletchley Park, and was arguably the most influential figure in the development of computer science.

If you enjoyed breaking these codes, please try Mini-Challenge #7.

ċ
frequency_analysis.py
(4k)
Sam Dolan,
23 May 2013, 03:35