Skip to content

Technical Implementation of Publishertools Oy’s Services

At the latest event in Business Tampere’s “Smarter Organization” series, the disruption of traditional industries in the age of artificial intelligence was discussed. My part of the presentation was to take a look under the hood of the technical implementation.

Identified Needs
At Kustannusosakeyhtiö Siltala, the need was identified to develop a tool that could automate processes related to publishing and enhance the visibility of books. Enhancing visibility is achieved through enriching metadata. Tasks related to producing audiobooks could also be automated, reducing the need for human input in the instructions given to voice actors.

Kuva1

Figure 1. Identified Needs at the Start of the Project

Both needs can be solved with specialized AI algorithms that generate suggestions based on the book's content. Over nearly two years since the start, additional needs have been identified. The book's content has been supplemented before publication with author and subject indexes. These contents, typically related to nonfiction books, were previously delivered entirely by hand. The book must already be typeset at this stage to know the page numbers. Additionally, the EU Accessibility Directive sets requirements for e-books that were not previously considered, making the production of new e-books a bit more labor-intensive and likely requiring modifications to previously published e-books.

Kuva2

Figure 2. Current Understanding of Publishing Industry Needs

Solution for Enriching Metadata

The metadata for a book, aimed at enhancing its visibility, includes both keywords and the book's theme code classification. This metadata helps various online stores and search engines target the product to its intended reader.
The technical implementation begins with selecting the type of input. Possible options include a manuscript from the author, a print-ready PDF, or an e-book in EPUB format.


A manuscript provided by the author is not a single, standardized format because there are many common tools, and these tools allow a wide range of content presentation. Access to the original manuscripts proved surprisingly challenging. Publishers might not retain the manuscript format after a print-ready PDF or typesetting file has been completed.


The print-ready PDF seems to be the archived version. Extracting text from a PDF is challenging in its own way because it has been paginated. During pagination, words are split across lines and pages. Additionally, artistic effects like drop caps (anfang) may have been used in pagination. Reversing these changes programmatically doesn’t always produce the original text.


The electronic EPUB format would structurally be closest to the software designer’s preference, but it’s not suitable because not all books are produced in EPUB format. It is also only available after the typesetting file is completed, meaning it would be accessible later in the book’s life cycle.


Among these options, we decided to choose the print-ready PDF format. Looking back, this decision would remain the same if we were to start over.

kuva3

Figure 3. Metadata Enrichment Process

The intermediate goal of enriching metadata is to make the results available in Storia’s (formerly Kirjavälitys) systems. From there, the metadata is updated to bookstores, online retailers, and search engines.

kuva4

Figure 4. Metadata Enrichment Implementation

The user is required to pay attention to starting the process. This begins by submitting the PDF book into the system. The next step is converting the PDF into raw text. Open-source PDF libraries are sufficient for this task.
Keyword and theme code suggestions are made by Annif models specifically trained for this purpose. Annif is a combination of several AI algorithms developed by researchers from the National Library of Finland. Two separate models have been trained for enriching metadata.


Once the models have produced keyword and theme code suggestions, the user’s task is to complete and confirm them. After that, the integration transfers the updated information to Storia’s database.

Reading Instructions for the Voice Actor
Producing an audiobook requires not just the book but also instructions. It is apparently quite common for the voice actor to receive a stack of paper. The actor is then responsible for locating the instructed sections in the book and following the instructions accordingly.


The goal was to provide the voice actor with a view of the book in which words and phrases requiring attention are naturally part of the narration. The system helps the producer of the reading instructions identify the sections that require attention and guide their pronunciation.

kuva5

Figure 5. Producing Reading Instructions Based on the Book

The production of reading instructions begins by converting the PDF-format book into raw text. From this raw text, foreign words, which differ from the language of the book, are identified. Language identification is then performed on these foreign words. Once the words that deviate from the text and their language have been identified, the actual reading instructions can be produced. This is done by generating a phonetic representation of the foreign word in the International Phonetic Alphabet (IPA). This representation can be used either as an audio sample or a written reading instruction.


Since the IPA representation can be quite difficult to understand for those unfamiliar with the alphabet, a more accessible format is required. In this application, the format was defined based on user feedback.

kuva6

Figure 6. Producing Reading Instructions at a Detailed Level

The reading instruction application has two user roles: one is the producer of the reading instructions and the other is the voice actor. The creation of reading instructions begins by submitting the PDF-format book into the system. At this stage, the most likely foreign language of the book is specified. This information is used if the language of the foreign word cannot be identified.


The book’s processing begins by extracting the text from the PDF format. Open-source libraries were not sufficient for this purpose, so a custom implementation was required.


After the raw text is processed, foreign words are identified. For this purpose, we currently have a model in production, but we know its limitations. We are developing a second model that will push these limits much further. In foreign word identification, it is important to find the base form of the word. In Finnish, it is typical to decline foreign words according to Finnish language rules.


Once foreign words or phrases and their language are identified, a phonetic description can be produced based on this. The description is in IPA format, which is generally common but unfamiliar to many voice actors. Therefore, we developed a conversion to a more familiar alphabet for the reading instruction.


Based on the IPA description, an audio sample can also be generated using Microsoft’s Azure AI Speech service.
This processing is done after the book is submitted to the system by the reading instruction producer. Afterward, the producer can review and, if necessary, modify or complete the instructions. The reading instruction producer sends personalized links through the system to the voice actors and any reviewers, enabling them to access the instructions.
The voice actor has their own simplified user interface, which serves as a digital view of the book’s content. Reading instructions and audio examples are provided for individual foreign words and phrases. The reading instructions are not intended to be modified at this stage, allowing unnecessary functions to be stripped away from the user interface. This simplification has added clarity to the interface. Voice actors have provided very positive feedback.

Further Development
This text has described two use cases for publishing tools for books. These tools are in production and are being further developed mainly based on user feedback. Other tools mentioned earlier are also under development. The most urgent need is to produce e-books in EPUB format according to the EU Accessibility Directive.