Improved Text Recognition for Finnish Historical Newspapers with Transkribus

There are a number of old German Scripts that were used in the German speaking World between the 16th and 20th century. On the one hand, there were a number of handwritings including Kurrent Script, Sütterlin, Offenbacher Script. But there were also some printed Scrpits such as Fraktur or Antiqua. 

With the help of Transkribus accessing and searching a large number of historical documents – written in old German Scripts –  is today possible. By using one of the public models or a own model, thousands of historical documents can be automatically converted into text. 

The National Library of Finland has reprocessed almost two million historical newspaper pages with the Transkribus automatic text recognition workflow in cooperation with READ-COOP. The greatly improved recognition results convinced the Library of a workflow which was developed to its current state in the NewsEye project. The University of Innsbruck led this development. Text recognition in general, and high-accuracy recognition in particular, is of immense importance for the quality and usability of digitized historical sources.

The material in this reprocessing cooperation project with READ-COOP included a little under two million pages of Finnish newspapers dating from 1771 to 1914. The languages of the materials are Finnish and Swedish, according to the languages used in Finland during this timeframe. Now all the Finnish newspapers published in Finland starting from the first newspaper published in 1771 until the newspaper titles from 1914 and a selection of newspapers from 1915 to 1918 have been reprocessed.

The newly reprocessed newspapers will gradually replace the older versions, with lower optical character recognition results, in the publication and presentation system of the National Library of Finland, starting from summer 2021. The Library will launch an information campaign regarding the quality improvements. We are also aiming to process more newspapers from 1914 onwards, but this decision will follow later.

The improvement of the text recognition results has been considerable and we are currently calculating the exact figures. These will be published on https://digi.nationallibrary.fi .

The work in this cooperation has been financed by the EU’s European Regional Development Fund / Leverage for the 2014-2020 funding period.

Ready to start your own success story?

Sign up for free and start unlocking the past with Transkribus