The computational linguist who cracks historical riddles
Throughout history ciphers have been used to preserve many secrets. Beáta Megyesi leads an interdisciplinary project involving linguists, computer scientists and historians, among others, to use AI to decode secret documents.
Watch film with Beáta Megyesi on decoding historical encrypted texts.
Beáta Megyesi is fascinated by languages and by seeing patterns in languages. After studying computational linguistics at Stockholm University, she did her PhD at KTH Royal Institute of Technology in Stockholm, moved to Uppsala University before returning to Stockholm University in 2023. Today she is a professor of computational linguistics and works with automatic analysis of texts.
In 2011, she accidentally came across a research area that was new at the time and she has devoted herself to this field – historical cryptology – ever since. Here, researchers from different disciplines are working together to find ways to decode historical secret documents.
“I was inspired by a cipher we cracked and wanted to look at the problem more systematically on a larger scale. It is important to have the opportunity and take time to explore problems that you stumble upon and find interesting to work with,” says Beáta Megyesi.
Photo: Ingmarie Andersson
Thousands of historical manuscripts written in secret code can be found in archives and libraries around the world, and their ciphers have allowed them to be kept secret from outsiders. These manuscripts include everything from diplomatic correspondence and intelligence reports to private letters, diaries or texts related to secret societies.
What made you interested in studying ciphers?
“The versatility of the research. The complexity and variety of these fascinating historical sources require both advanced humanistic research and new technologies. It is a demanding intellectual challenge that involves different topics and enables collaboration with different experts. As a computational linguist, I try to build bridges between the disciplines involved,” says Beáta Megyesi.
Visits to the Vatican Archives
In addition to the need for expertise from several research fields, such as linguistics, computer science and history, there is also a need for data from various archives. Over the years, Beáta Megyesi has spent many hours in archives, going through ciphers in ancient writings. In Sweden, she has been to the National Library of Sweden, Carolina Rediviva at Uppsala University and the National Archives, but what has impressed her most is a couple of visits to the Vatican Archives in Rome, one of the oldest and largest archives in the world.
When Beáta Megyesi contacted the archive at the end of 2012, she received a quick response from the archive’s director saying that she was welcome. The visits to the Vatican Archives were something special. After passing the Swiss Guard in their colourful uniforms, she was allowed to enter the archives where most of the Vatican’s correspondence since the early Middle Ages is preserved. Under strict supervision, she went through thousands of letters between bishops, cardinals and popes, some texts dating back to the 14th century. She ordered copies of the letters she wanted to study further at home in Sweden. Based on these letters, she has, together with other researchers, cracked several secret ciphers written by or sent to the Vatican.
Great attention for occult cipher
In April 2011, Beáta Megyesi with two other researchers managed to crack the so-called Copiale cipher. It is an encrypted manuscript in the form of a book containing 75,000 handwritten letters and symbols. The manuscript is dated to the period 1760 – 1780, but the text itself is probably about 25 years older. The manuscript contains abstract symbols as well as letters from the Greek and Latin alphabets. The researchers discovered that the document originated from a secret German order called the Occulists. Their study received enormous international attention, including New York Times and Der Spiegel. Beáta Megyesi describes it as a “whirlwind of media attention“. For several years afterwards, she was contacted by researchers and private individuals about the Copiale cipher and other ciphers.
Learn more about the Copiale cipher.
Network for historical cryptology
In order to develop research in historical cryptology, Beáta Megyesi created a network of researchers from different disciplines. The first international conference on the subject was arranged in Uppsala in 2018. The network currently consists of over 100 researchers. Beáta Megyesi submitted an application to the Swedish Research Council for the DECODE project, which in 2015 – 2017 developed methods for automatically decrypting historical documents with encrypted text. The project was led from Uppsala University in collaboration with computer scientists in California and Barcelona.
In order to continue the research, some researchers decided to start what became the project “DECRYPT: Decryption of historical manuscripts” when the Swedish Research Council had a call for interdisciplinary research. The project was granted SEK 29.5 million for the period 2018 – 2024. The project includes about 20 researchers in computational linguistics, cryptology, image processing, computer science, history and linguistics from several countries.
Interdisciplinarity and artificial intelligence
In the summer of 2023, Beáta Megyesi joined Stockholm University as a professor of computational linguistics, taking with her the responsibility for DECRYPT. She emphasises interdisciplinarity and artificial intelligence (AI) as two main components of DECRYPT.
“Within the project, we have combined several different applications of AI. Research questions from the field of humanities drive the work, and AI models are developed and used in several areas.”
AI is used in several ways in the project, including image analysis to convert symbols in the image into text and for deciphering. AI assists the user in interpreting the cipher.
The focus of the research lies currently on transcription. The researchers are collecting examples of different types of symbols from manuscripts that are used to train AI models.
“Transcription is one of the biggest challenges when we convert an image into text format. It is time-consuming and often a source of error. The texts consist of many types of symbols and unusual writing systems, they may have hardly legible handwriting and damaged pages. Encrypted texts also differ greatly in the way they are encoded, and guessing the type of code and the underlying language of the cipher poses additional difficulties,” says Beáta Megyesi.
Lots of material waiting for deciphering
There is a large amount of historical material with rare or unknown writing systems waiting to be analysed and deciphered.
“The most common encryption methods historically were so-called substitution ciphers, where alphabetic characters, syllables, words, phrases or sentences were replaced with their own codes written as numbers, alphabetic characters or various symbols such as Zodiac or alchemical signs,” says Beáta Megyesi.
Historical cryptology is a new field of research, so there is a lot to explore.
“We need to collect more material to be able to create better algorithms that can be used to help with the transcription of different symbol systems, and we need better analysis methods adapted to historical variants of the world’s thousands of languages,” says Beáta Megyesi.
What do you hope DECRYPT will achieve?
“The main goal is to develop new methods and tools that can identify, transcribe and interpret historical ciphers, which can then be applied to new sources for new insights into our history. Another goal is for historical cryptology to become an established subject which will allow experts from different disciplines to collaborate on complex problems and learn from each other. Solving problems together is extremely rewarding but it also requires time, patience and respect for each other’s skills.”
What fascinates you most about ciphers?
“They are enigmatic, priceless and exciting but as difficult to interpret and challenging as they can be!”
Making results available to all
Photo: Ingmarie Andersson
Beáta Megyesi main focus is basic research, but she is keen to ensure that the results benefit society and the general public. All material from DECRYPT are released freely. The ciphers and keys with descriptions are available in the DECODE database. The transcription and deciphering tools are available online and also for download. The scientific results are published open access and freely available to everyone.
“Making research results available to the public has several important benefits, which affect both the scientific community and society at large. When researchers have access to each other’s work, they can build on existing results instead of reinventing the wheel. This can accelerate the pace of scientific progress and lead to complex problems being solved faster. We can also collaborate more easily and combine knowledge from different fields, which enables interdisciplinary research,” says Beáta Megyesi.
The results of the research will also benefit the public.
“All the historical ciphers that we post are there for those interested to sink their teeth into and try to crack. Many people have fun with it, like a kind of sudoku. Our project can also lead to better automatic handwriting recognition and better deciphering algorithms.”
Material from the research network is included in an exhibition on cryptography at the Deutsches Museum in Munich. Participants from the project also share ciphers for The Crypto Challenge Contest MysteryTwister started by one group member, where visitors are invited to solve ciphers.
At the end of June 2024, a major international conference on historical cryptology (HistoCrypt 2024) took place in Oxford and Bletchley Park with Beáta Megyesi as one of the founders and a member of the programme committee.
Further reading
Project site on Stockholm University web
DECRYPT website
Beáta Megyesi´s profile page
HistoCrypt 2024
On the “Castle Cipher“ that Beáta Megyesi helped to decipher
Last updated: July 3, 2024
Source: Communications Office