Project description
The need of standardization has been increasingly important in the different fields as a common way of understanding and exchange information. Scientific disciplines have early established formal protocols and languages, which have been quickly adopted and adapted to their particular problems. Humanities and cultural disciplines have followed, however, an independent path in which creativity and tradition play an essential role. Literature, and especially poetry, are a clear reflect of this idiosyncrasy. From the philological point of view, there is no uniform academic approach to analyze, classify or study the different poetic manifestations, and the divergence of theories is even bigger when comparing poetry schools from different languages and periods. POSTDATA project has been born to bridge the digital gap among traditional cultural assets and the growing world of data. It is focused on poetry analysis, classification and publication, applying Digital Humanities methods of academic analysis -such as XML-TEI encoding- in order to look for standardization. Interoperability problems between the different poetry collections are solved by using semantic web technologies to link and publish literary datasets in a structured way in the linked data cloud. The advantages of making poetry available online as machine-readable linked data are threefold: first, the academic community will have an accessible digital platform to work with poetic corpora and to contribute to its enrichment with their own texts; second, this way of encoding and standardizing poetic information will be a guarantee of preservation for poems published only in old books or even transmitted orally, as texts will be digitized and stored as XML files; third: datasets and corpora will be available and open access to be used by the community for other purposes, such as education, cultural diffusion or entertainment.
To achieve this aim, a poetry lab has been created inside this platform that let users apply the most up-to-date language technologies and computational linguistics to process poetry data. Some of these tools include a metrical analyzer that detects syllabic structure of stanzas and accentual patterns, name-entity recognition tools to identify places, people and dates mentioned, the combination of lemmatizers and parsers to identify syntactic structures for poetic purposes and sentiment analysis to detect emotions and feelings in the poems. The combination of all these processes in the same environment will ease and improve researchers analyses and will generate further applications, such as automated genre detection or rhetorical and stylistic figures.
Collaborating Companies or Organisations
Indra´s Role
Indra is the project coordinator where the Principal Investigator works with the role of Head of Artificial Intelligence Product Development.
Universities and Technological Centers
Technologies used
Semantic Web technologies, Linked Open Data, XML-TEI, Natural Language Processing, Databases
More information
This project, under reference nº H2020- 679528, has received funding from the European Research Council (ERC) under the European Commission’s Horizon 2020 research and innovation programme.
Starting date at Indra 11-09-2017.