Language

Home » Use Cases » Language

CLARIN is a digital infrastructure which provides sustainable access to a broad range of language data and tools to support research in the humanities, social sciences, and beyond, with participating centers (universities, research centers, libraries and public archives) all over Europe and further afield. Data (and metadata) Interoperability is the degree to which data and software can be used in combination with other data and software without any ad-hoc adaptations.  Interoperability comes at two levels: the form or syntactic level, and the meaning or semantic level. At the syntactic level, interoperability ensures that data are compatible regarding the formats they come in. At the semantic level, interoperability ensures that data are compatible regarding the meanings of the form elements they come in. CLARIN strives towards maximal interoperability. However, tackling all the above-mentioned types of datasets in a combined fashion has long been a challenge.

For analyzing and correctly interpreting the sheer amount of language data, DataGEMS will provide novel applications of algorithms, which stretch the limits of automatic knowledge graph construction, data linking and analytics-based exploration. Increased discoverability of language data will promote the development of state-of-the-art language technologies (such as Large Language Models) in Europe.

DataGEMS is a Research and Innovation Action funded by European Union under Horizon Europe Research and Innovation Programme via Grant Agreement No 101188416

Copyright © 2025 DATAGEMS. All Rights Reserved.