“Improving access to a critically under-resourced language: AI-based approaches for producing and obtaining Livonian content”

Project No 5.2.1.1.i.0/2/24/I/CFLA/007 “Internal and External Consolidation of the University of Latvia” of the second round of the Consolidation and Governance Change Implementation Grants within Investment 5.2.1.1.i “Research, Development and Consolidation Grants” under Reform 5.2.1.r “Higher Education and Science Excellence and Governance Reform” of Reform and Investment Strand 5.2 of the Latvian Recovery and Resilience Mechanism Plan “Ensuring Change in the Governance Model of Higher Education Institutions”.

Project:  “Improving access to a critically under-resourced language: AI-based approaches for producing and obtaining Livonian content”
Nr. LU-BA-PA-2024/1-0056

(2024–2026)

The aim of the project is to explore whether and how building speech synthesis and recognition is possible for critically under-resourced languages with limited data and an extremely small number of contemporary speakers. This will be achieved through the application of experimental data processing methods – based on the hypothesis developed by the project team – together with (1) the AI applications, know-how, and guidance provided by the Artificial Intelligence Laboratory (AIL) at the University of Latvia (UL) Institute of Mathematics and Computer Science, and (2) the experience in Livonian phonetics, prosody, and development of speech corpuses provided by the University of Tartu Institute of Estonian and General Linguistics (IEGL), both of which are partners of this project.

This project focuses on Latvia’s indigenous Livonian language. As a pre-condition, it uses work that has been previously performed by the applicant in the development of digital resources and tools for the Livonian language as well as the experience and excellence of the project partners. The project seeks not only to obtain new knowledge about the use of AI tools in the specific conditions in which critically under-resourced endangered languages exist – which is the project’s main novelty – but also to achieve practical results in the form of improving access for the general public, researchers, and the Livonian community itself to written Livonian language collections and – in the longer perspective – to the content of audio collections.

During the timeframe of the current project, the applicant and partners are heading towards the overarching goal of creating speech recognition for Livonian, while focusing within the project on the first step – the creation of speech synthesis for Livonian and multiplication of data. This also serves as a precondition for future research and the development of speech recognition.

Project result: scientific publication – 2, conference materials – 3, scientific databases and data collections – 1, submitted project application – 1, other results relevant to the research subject – Livonian speech synthesizer

RESULTS OBTAINED:

SCIENTIFIC PUBLICATIONS

Tuisk, Tuuli, Nicolai Pharao. Unveiling Tonal Contrasts in The Baltic Region: Exploring Stød in Livonian Spontaneous Speech. Linguistica Uralica LX 2024 4. pp 241—270. DOI: https://doi.org/10.3176/lu.2024.4.01. Available here.

Ernštreits, Valts. Towards the speech recognition for Livonian. 9th International Workshop on Computational Linguistics for Uralic Languages, IWCLUL 2024, November 28-29, 2024 : Proceedings of the Workshop Helsinki : Association for Computational Linguistics, 2024. P.76-80. Available here

SCIENTIFIC CONFERENCES AND SEMINARS, THESIS

Ernštreits, Valts. Towards the speech recognition for Livonian. 9th International Workshop on Computational Linguistics for Uralic Languages, IWCLUL 2024, November 28-29, 2024.

Ernštreits, Valts. Endangered Languages and Cultures in the Digital Era. Plenary speech at the student conference “Bridges in the Baltics”. Vilnius, 4.10.2024. Conference platform here.

Publicity

9 posts about project events published on institute’s Facebook profile. Publications can be located by using hashtags #NextGenerationEU #AtveselosanasFonds

Ernštreits Valts. Overview about UL LIvonian Institute’s projects. Ventspils Livonian culture days, 13.10.2024.

Ernštreits, Valts. Latvijas Radio 1 programme “Zināmais nezināmajā” – talk on Livonian, challenges of endangered languages in the digital age and approaches to narrow digital gaps. Broadcast could be listened here.

UL Livonian Institute participates at Europe Scientists Night. Information on projects, digital instruments and toold provided to participants. 27.09.2024. Information about schedule here.

Ernštreits, Valts. Participation on Radio SWH programme “Prāts izglābs pasauli”, talk on challenges and solutions of the digital world. 21.09.2024. Broadcast can be followed here.

POLICY IMPACT

Project scientific team participants start their work (13.09.2024.) on Ad-Hoc groups adjacent to the Global Task Force of making the UN International Decade of Indigenous languages (2022–2032). Gunta Kļava starts work in the group dedicated to the language transmission and Valts Ernštreits starts to work and is elected one of two co-chairs for the group dedicated to digital equality and domains of indigenous languages. According to Valts Ernštreits preposition Ad-Hoc group on digital equality prepares draft ammendments from the perspective of indigenous languages for Recommendation on the Ethics of Neurotechnology (more here). Currently, led by Valts Ernštreits, Ad-Hoc group also prepares global survey on the presence of indigenous languages in the digital space.