“Result of an attempt to physically unroll a scroll”

Ancient Herculaneum scroll piece revealed by AI – here’s what it says
by Jeremy Hsu  /  5 February 2024

“Artificial intelligence has helped decipher an ancient papyrus scroll, which was transformed into a lump of blackened carbon by volcanic ash from Mount Vesuvius in AD 79. The first passages of readable text reveal never-before-seen musings from a Greek philosopher. The discovery nabbed the $700,000 grand prize in the Vesuvius Challenge, and used a combination of 3D mapping and AI techniques to detect ink and decipher letter shapes within segments of scrolls known as the Herculaneum papyri, which had been digitally scanned. The combined efforts of the winning team members – Youssef Nader, Luke Farritor and Julian Schilliger – could pave the way for more discoveries from additional papyrus scrolls that were once housed in a library in the ancient Roman town of Herculaneum. “I think it’s going to be a huge boon to our knowledge of ancient philosophy, just gigantic – a staggering amount of new text,” says Michael McOsker at the University College London, who was not involved in the discovery.

The winning submission met the Vesuvius Challenge criteria of deciphering more than 85 per cent of characters in four passages consisting of 140 characters each – and as a bonus, it included another 11 columns of text for a total of more than 2000 characters. Those rediscovered Greek letters reveal the thoughts of Philodemus, who is thought to have been the philosopher-in-residence at the library that housed the Herculaneum papyri. The deciphered text focuses on how the scarcity or abundance of food and other goods impacts the pleasure they deliver. That fits Philodemus’s Epicurean school of philosophy, which prioritised pleasure as the main goal in life. His 2000-year-old writing even appears to possibly take a dig at the Stoic school of philosophy that has “nothing to say about pleasure”.

And the Vesuvius Challenge isn’t over. Its 2024 goals include figuring out how to scale up the 3D scanning and digital analysis techniques without becoming too expensive. The current techniques cost $100 per square centimetre, meaning that it could cost between $1 million and $5 million to virtually unroll an entire scroll – and there are 800 scrolls waiting to be deciphered. “Realistically, the vast majority of the known, already unrolled library is Epicurean philosophy and that’s what we should expect, but there are also important Stoic texts, maybe some history and some Latin literature. Complete texts of authors like Ennius or Livius Andronicus, early Roman authors [whose works] did not survive, would be great,” says McOsker. “Epicurus’s Symposium, in which he wrote about the biology of wine consumption, would be a lot of fun.”

“The models use small input/output windows. In some cases, the output is even only binary (“ink” vs “no ink”), as shown in this animation.This makes it extremely unlikely for the model to hallucinate shapes that look like letters.”

AI helps decipher first text of “unreadable” ancient Herculaneum scroll
by   /  10/16/2023

“Hundreds of badly charred ancient Roman scrolls found in a Roman villa have long been believed to be unreadable, but a 21-year-old computer science student at the University of Nebraska-Lincoln has successfully read the first text hidden within one of the rolled-up scrolls using a machine learning model. The achievement snagged Luke Farritor a $40,000 First Letters prize from the Vesuvius Challenge, a collaboration between private entrepreneurs and academics offering a series of rewards for milestones in deciphering the scrolls. A second contestant, Youssef Nader, received a smaller $10,000 First Ink prize for essentially being the second person to decipher letters in a scroll. The main prize of $700,000 will be awarded to the first person to read four or more passages from one of the scrolls by December 31, and the founders are optimistic that this goal is achievable in light of these most recent breakthroughs.

As previously reported, the ancient Roman resort town Pompeii wasn’t the only city destroyed in the catastrophic 79 AD eruption of Mount Vesuvius. Several other cities in the area, including the wealthy enclave of Herculaneum, were fried by clouds of hot gas called pyroclastic pulses and flows. But still, some remnants of Roman wealth survived. One palatial residence in Herculaneum—believed to have once belonged to a man named Piso—contained hundreds of priceless written scrolls made from papyrus, singed into carbon by volcanic gas. The scrolls stayed buried under volcanic mud until they were excavated in the 1700s from a single room that archaeologists believe held the personal working library of an Epicurean philosopher named Philodemus. There may be even more scrolls still buried on the as-yet-unexcavated lower floors of the villa. The few opened fragments helped scholars identify a variety of Greek philosophical texts, including On Nature by Epicurus and several by Philodemus himself, as well as a handful of Latin works. But the more than 600 rolled-up scrolls were so fragile that it was long believed they would never be readable since even touching them could cause them to crumble.

“Charred scrolls can’t be opened easily, but X-ray scanning can reveal their contents”

“This was a cultivated Roman aristocrat’s country villa, and Piso would have had lots of books there, especially Latin ones, of which so far very few have been found in the villa,” Robert Fowler, a classicist and papyrus expert at the University of Bristol in England, told The New York Times. “Recovering such a library would transform our knowledge of the ancient world in ways we can hardly imagine. The impact could be as great as the rediscovery of manuscripts during the Renaissance.” Scientists have brought all manner of cutting-edge tools to bear on deciphering badly damaged ancient texts like the Herculaneum scrolls. For instance, in 2019, German scientists used a combination of physics techniques (synchrotron radiation, infrared spectroscopy, and X-ray fluorescence) to virtually “unfold” an ancient Egyptian papyrus. Their analysis revealed that a seemingly blank patch on the papyrus actually contained characters written in what had become “invisible ink” after centuries of exposure to light.

“a: Mobile XRF spectrometer ELIO, b: laser spot on fragment and c: measured spectra”

Brent Searles’ lab at the University of Kentucky has been working on deciphering the Herculaneum scrolls for many years. He employs a different method of “virtually unrolling” damaged scrolls, which he used in 2016 to “open” a scroll found on the western shore of the Dead Sea, revealing the first few verses from the book of Leviticus. The so-called En Gedi scroll was recovered from the ark of an ancient synagogue destroyed by fire around 600 CE. To the naked eye, it resembled a small lump of charcoal, so fragile that there was no safe way to analyze the contents. The team’s approach combined digital scanning with micro-computed tomography—a noninvasive technique often used for cancer imaging—with segmentation to digitally create pages, augmented with texturing and flattening techniques. Then they developed software (Volume Cartography) to virtually unroll the scroll.

The older Herculaneum scrolls, however, were written with carbon-based ink (charcoal and water), so one would not get the same fluorescing in the CT scans. But Searles thought the scans could still capture minute textural differences indicating those areas of papyrus that contained ink compared to the blank areas, training an artificial neural network to do just that. And a few years ago, he had two of the intact scrolls analyzed at a synchrotron radiation lab in Oxford. Then tech entrepreneurs Nat Friedman and Daniel Gross heard about Searles’ work, and they all decided to launch the Vesuvius Challenge in March this year, reasoning that crowdsourcing would help decipher the scrolls’ contents that much faster. Searles released all the scans and code to the public as well as images of the flattened pieces.

“Brent Searles supervises a student scanning a scroll in his laboratory”

Some 1,500 teams have been collaborating on the challenge through Discord, and as each milestone is reached, the winner’s code is also made available so everyone can continue to build on those advances. In August, a contestant named Casey Handmer announced his discovery of a “crackle pattern” that resembled ink in the segmented CT scans of the scrolls, even making out what appeared to be a letter. That made Handmer the first to find substantial, convincing evidence of ink within the unopened scrolls. Farritor, a SpaceX summer intern, decided to train his own machine learning model on those crackle patterns, improving his model with each new pattern found. The model eventually revealed the word “πορφυρας” meaning “purple dye” or “cloths of purple,” a word that rarely shows up in ancient texts.

“When I saw the first image, I was shocked,” Federica Nicolardi, a papyrologist at the University of Naples in Italy who was among those who reviewed the findings, told Nature. “It was such a dream. I can actually see something from the inside of a scroll.” The discoveries of Handmer and Farritor inspired Nader, an Egyptian bio-robotics student in Berlin, who focused on the ink-detection efforts rather than combing through crackle patterns. He used a modified version of the machine learning model to study detached fragments, focusing on the same area of the scroll as Farritor. His final image might just be an entirely new text for modern scholars. Papyrologists have already offered speculations about possible words above (ανυοντα, “achieving”) and below (ομοιων, “similar”). Nader’s model has since deciphered an additional four and a half columns of text separated by margins, although not all the visible letters are legible.”

DOI: Nature, 2023. 10.1038/d41586-023-03212-1

New AI translates 5,000-year-old cuneiform tablets instantly
by Kevin Dickinson  /  July 4, 2023

“Translation isn’t simply a matter of swapping one word for a corresponding word in another language. A high-quality translation requires the translator to understand how both languages string thoughts together and then use that knowledge to create a translation that maintains the linguistic nuances of the original, which native speakers effortlessly understand. As difficult as that process is, it’s nothing compared to the challenge of translating an ancient language into a modern tongue. These translators must not only resurrect extinct languages from written sources but also have intimate knowledge of how the cultures that produced those sources evolved over centuries. If that weren’t enough, their sources are often fragmented, leaving crucial context lost to the ages. Because of this, the number of people capable of translating languages from antiquity is small, and their best efforts are often outpaced by the volume of texts unearthed by archeologists. Take ancient Akkadian. This early Semitic language is one of the best attested from the ancient world. Hundreds of thousands, by some accounts more than a million, Akkadian texts have been discovered and today lie in museums and universities. Many have even been digitized online. Each one has the potential to teach us about the life, politics, and beliefs of the first civilizations, yet this knowledge remains locked behind the time and manpower necessary to translate them. To help change that, a multidisciplinary team of archaeologists and computer scientists has developed an artificial intelligence that can translate Akkadian almost instantly and unlock the historic record preserved in these 5,000-year-old tablets.

“Hundreds of thousand of cuneiform tablets are housed in museum and university collections, yet many of these remain untranslated due to how time-intensive the process is and how few people have the expertise to do so. (Credit: Phillip Tellis)

Akkadian was the mother tongue of the Akkadian Empire, which arose around 2300 B.C. through the conquests of its founder, Sargon the Great. As a spoken language, Akkadian would eventually split into Assyrian and Babylonian dialects before being completely supplanted by Aramaic early in the first millennium BC. Today, it is a truly extinct language, without even daughter languages to carry on its legacy. As a written language, however, Akkadian proved more enduring. The empire borrowed the cuneiform script of its predecessor, the Sumerian civilization. This writing system used a reed stylus to impress wedge-shaped glyphs into wet clay tablets before baking them (hence the name cuneiform, which literally means “wedge-shaped” in Latin). Even after Aramaic supplanted Akkadian as the common language of the region, scholars continued to write in Akkadian cuneiform into the first century AD — even in antiquity, it seems, scholars and academics were incredibly stubborn. This traditional mindset had an unintended benefit for modern archeologists, too. While cuneiform could be written on papyrus, it was more often scribed onto clay or stone. These materials stand up much better to the fires and floods that ravaged their pithy peers. And while time is cruel to all things — archeologists rarely discover cuneiform tablets in mint condition — this is one reason why Akkadian writing may be so well-attested in the historic record. “Ironically, destructive conflagrations have preserved some of ancient Mesopotamia’s greatest libraries — because they were made of clay. In contrast, all of ancient Egypt’s papyrus libraries have burnt or crumbled to dust, though many individual codices survive,” linguist Steven Roger Fischer writes in A History of Writing.

Even with such linguist riches, properly translating these ancient libraries is no small feat. Beyond the challenges already mentioned, the Akkadian language is polyvalent. That is, its cuneiform signs can have several different readings depending on how each one functions in a sentence. There are many reasons for this development, but according to Fischer, one reason the Akkadians never simplified was that they “appeared to be bound to tradition and a self-imposed efficiency.” That traditional mindset led them to continue using Sumerian script for a language very different from Sumerian. (When it comes to historical scholarship, you win some, you lose some.) As such, translating Akkadian is a two-step process. First, scholars must transliterate the cuneiform signs. That is, they take the cuneiform and rewrite it using the similar-sounding phonetics of the target language. An example most readers will be familiar with is the Arabic word الله, which translates into English as “God” but transliterates as “Allah.” This transliteration is the closest the Latin alphabet can get to producing the word as it sounds in Arabic. Scholars then take their transliteration of the text and translate it into a modern language.

As you can imagine, that can be a long and laborious process — one that takes years of training and dedication to learn to do well. To help speed things along, the research team developed a neural machine translation model for Akkadian cuneiform, the same technology under the hood of Google Translate. The team trained the AI model on a sample of cuneiform texts from the Open Richly Annotated Cuneiform Corpus and taught it to translate in two distinct ways. First, the AI model learned to translate Akkadian from transliterations of the original texts. It also learned how to translate cuneiform symbols directly. More specifically, it translated Unicode glyphs of cuneiform texts that were generated by another time-saving tool that automatically produces Unicode from an image of an original tablet. The AI model then had to figure out how to handle the nuances of the sample’s various genres — for example, the difference between literary works and administrative letters — as well as how to handle the changes found in cuneiform script over the millennia it was used.

The AI model was then tested using the bilingual evaluation understudy 4 (BLEU4), an algorithm used to appraise machine-translated text. In its transliteration to English test, the team’s AI model scored 37.47. In its cuneiform to English test, it scored 36.52. Both scores were above their target baseline and in the range of a high-quality translation. And there was a surprising result: The model was able to reproduce the nuances of each test sentence’s genre. While this wasn’t one of the researcher’s goals, they note in the study that it may open possibilities for uses beyond translation. “In almost every instance, whether the [translation] is proper or not, the genre is recognizable,” the team writes. “A promising future scenario would have the [model] show the user a list of sources on which they based their translations, which would also be particularly useful for scholarly purposes.” The team published their results in the peer-reviewed PNAS Nexus. They also released their research and source code on GitHub at Akkademia.

“Although clay and stone tablets may stand up better than papyrus to the ravishes of time, they are often still found fragmented and may be missing crucial context.” 

As promising as the initial results are, there is still work to be done. In both cases, some of the test sentences were mistranslated. And like other AI models, this one is prone to hallucinations — moments where the response has no connection to the source. In one instance, the human translator produced the sentence “Why should we (also) conduct the lawsuit before a man from Libbi-Ali?” The AI’s translation: “They are in the Inner City in the Inner City.” (A bit off.) All told, the AI model works best when it is translating short- to medium-length sentences. It also does better with more formulaic genres, like royal decrees and administrative records, than literary genres such as myths, hymns, and prophecies. With more training on a larger dataset, the researchers note in the study, they aim to improve its accuracy.”

“In time, they hope their AI model can act as a virtual assistant to human scholars. The AI can provide the raw translation quickly, while the scholar can refine it with their knowledge of historic languages, cultures, and people. “Hundreds of thousands of clay tablets inscribed in the cuneiform script document the political, social, economic, and scientific history of ancient Mesopotamia. Yet, most of these documents remain untranslated and inaccessible due to their sheer number and limited quantity of experts able to read them,” the team writes in the study. “This is another major step toward the preservation and dissemination of the cultural heritage of ancient Mesopotamia.”