Reproducible methods in the Arts and Humanities through workflows: the case of the SSH Open Marketplace
Laure Barbot1, Elena Moro Battaner2, Stefan Buddenbohm3, Maja Dolinar4, Edward Gray1, Cristina Grisot5, Klaus Illmayer6, Alexander König7, Michael Kurzmeier6, Barbara McGillivray8, Clara Parente Boavida9, Christian Schuster10
1DARIAH; 2Universidad Rey Juan Carlos; 3Göttingen State and University Library; 4ADP; 5University of Zurich & DaSCH; 6OEAW; 7CLARIN; 8King's College London; 9Iscte-Instituto Universitario de Lisboa; 10Babeș-Bolyai University Cluj-Napoca
The Social Sciences and Humanities Open Marketplace - marketplace.sshopencloud.eu/ -, one of the flagship services of the “Social Sciences and Humanities Open Cluster” (SSHOC)[1], is a discovery platform for new and contextualised resources from the Social Sciences and Humanities (SSH). Its main aim is to support researchers in discovering, accessing, and comparing digital tools and methods for their research. With its five content types - Tools & Services, Datasets, Training Materials, Publications, and Workflows - the SSH Open Marketplace covers a large range of research practices.
Workflows are defined as “Sequences of steps that one can perform on research data during their lifecycle. Workflows can be achieved by using diverse tools, resources and methods, and the useful resources are connected to each step” (SSH Open Marketplace (2023)). The primary value of workflows in the SSH Open Marketplace lies in their ability to be reused and applied to different research contexts or projects. This is enabled by the basic structure of workflows, which can be adapted to a range of real research use cases, while capturing tools, methods and processes that can be reused beyond individual research projects. By doing so, SSH Open Marketplace workflows support reproducible endeavours highlighting which nascent methods are in use in a given research community or how already agreed standards are applied.
At the time of writing this abstract, the SSH Open Marketplace counts 48 workflows,[2] most of which were created during in-person events. Based on this experience, face-to-face workshops have been found to be one of the most effective ways of popularising workflows and guiding researchers to upload their methodology in a new and not necessarily familiar format. While some workflows are “service-oriented”, highlighting what can be done thanks to tool chains, like for example the LODification of bibliographical data: Zotero to Wikibase migration with ZotWb[3], other workflows focus on the functioning of a given service, see ArkeoGIS how to share a dataset on the platform.[4] Contrasting this contextualisation of services, some workflows are more oriented towards standards in practice, as it is the case for Create a dictionary in TEI[5] or Collaborative Digital Edition of a Musical Corpus.[6]
Promoting workflows, not only as an innovative way to document a research project, but to open up research methodologies and shed some light on processes rather than research outputs only, is crucial. Indeed, workflows can play a critical role in harmonising research methods and contribute to foster reproducibility of these methods across projects. Our paper also aims to highlight the challenges faced by the collective workflow collection as developed in the SSH Open Marketplace: generalising from practices is a complex epistemological task, and maintaining a coherent and up-to-date collection of SSH workflows and promoting them to ensure their reusability requires considerable resources. This work is coordinated by the SSH Open Marketplace Editorial Board and we would like to hear from the DARIAH Annual Event audience what could be improved so that the existing and future Marketplace workflows can best benefit the Arts and Humanities research communities.
Using GitHub for digital editions. From Transkribus to static websites
Laura Untner, Peter Andorfer
Austrian Academy of Sciences, Austria
Arthur Schnitzler (1862–1931) is one of few Austrian authors today who is read and received in-ternationally. Shortly after his death on October 21, 1931, Clara Katharina Pollaczek gathered her memories of her former partner. The resulting typescript, which was typed by Schnitzler’s secretary Frieda Pollak and is now kept in the Vienna City Library (ZPH 242), comprises 990 pages and mainly contains letters, diary entries and commentary notes. Entitled Arthur Schnitzler und ich, the memoir was first published in the form of a digital edition in 2023 (Müller/Untner/Mangel/An-dorfer 2023; see Fig. 1–2).
Because the digital edition was intended as a work in progress from the very beginning, it was published soon after an OCR was executed in Transkribus (using the »Text Titan I« model, cf. Transkribus 2023) and some other basic steps (initial collation, editorial commentary, table of con-tents) were completed. After that, we invited everyone to partake in improving the transcription – which is what happened. The project turned into a type of citizen science project, so the workflow had to meet this requirement. Especially the ongoing corrections made by volunteers in Transkri-bus were to be integrated into the digital edition with as little effort as possible. A workflow via GitHub that allows us to go from Transkribus to a static website in just two clicks made it possible to do justice to these dynamics.
To export the transcripts from Transkribus, we developed a GitHub action that accesses the Tran- skribus API using a Python package (Andorfer/Schlögl/Haak). Then, the exported files are trans- formed into valid XML/TEI documents using an adapted version of the XSL stylesheet developed for the default TEI export in Transkribus (Kampkaspar/Boenig/Stadler/Grallert). This stylesheet converts the METS and PAGE files, including the tags inserted in Transkribus (e. g. paragraph markings), into TEI files. Then, a separate XML document already containing metadata is created for each page via an automated comparison with the table of contents. To finally rebuild the web- site, another GitHub action developed for the DSE Static Cookiecutter (Austrian Center for Digital Humanities and Cultural Heritage) is used. After just a few minutes, the workflow is completed, and the data is updated.
Although the workflow was originally developed for a rather simple project, it is now also being tested for the digital scholarly edition Arthur Schnitzler: Briefwechsel mit Autorinnen und Autoren (Müller/Susen/Untner 2018–[2024]) and various other projects at the Austrian Centre for Digital Humanities and Cultural Heritage (e. g. those on the origins of the Austrian Federal Constitution (FWF P I 5679) and on Hanslick’s critiques (FWF P 35379)). Basically, the procedure remains the same, with the exception that in some cases more XSL transformations are required, e. g. to export a whole series of customized tags in Transkribus like greetings and farewells in letters and to create individual documents not for each page but for each letter.
[References & Figures in PDF]
The Polifonia Research Ecosystem: an Executable Data Management Plan
Enrico Daga2, Andrea Scharnhorst1, Raphael Fournier- S’niehotta3, Marco Gurrieri4, Jacopo de Berardinis5, James McDermott6, Marilena Daquino7, Jason Carvalho2, Marco Ratta2, Christophe Guillotel-Nothmann4
1Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Science, Netherlands, The; 2The Open University; 3CNAM, Université Pierre et Marie Curie; 4CNRS; 5King's College London; 6National University of Ireland Galway; 7University of Bologna
This paper introduces an innovative approach to documenting research components (including research data). The Research Ecosystem approach (Daga et al. 2023) gives guidelines to determine which components are relevant in the light of certain research questions, how to annotate them as semantic, machine-readable artifacts, and how to validate, control and preserve them. Developed in the context of Polifonia, a semantic-web-based research project to improve access to musical cultural heritage, the framework is re-usable in all projects which rely on collaborative, open software development. We zoom into specific challenges which emerge when dealing with digital objects from the cultural heritage domain - in our case music. In particular, we showcase how to achieve machine-readable expressions of license information, and enrichment of metadata supported by Large Language Models. Ultimately, the Research Ecosystem fosters the management of Data in their context, namely together with Tools and Reports. However, it goes beyond pure documentation. In the case of Polifonia, archetypical users with their specific information needs (described as Personas and Stories) constitute one important component type. Components are connected via annotations. This way, the digital-humanities-born specific methodological approaches, the data used, the tools built and the documentation produced, all form a network which shows the interdependencies of research questions, methods, and data. In this component-based structure, traditional project management elements such as Work packages and Tasks also appear as part of the annotation scheme. The beauty and efficiency of the approach lies in the consequent use of existing platforms such as GitHub, which already contain elements to build a machine-based Ecosystem. One result is a machine-executable Research Data Management plan, which takes standardization in RDM to a next level. The challenge in re-using the Ecosystem approach is to find answers to classic research management questions such as what are the right components, which annotations are needed to express the interdependencies, and how both components and their links need to change over time. To support re-use of concept, Polifonia defined workflows for building and evolving an Ecosystem next to workflows which form part of the Ecosystem. This way, documentation and management are intrinsically interwoven with classical research management questions: how to find the right questions, how to find the right methods, and how to organize collaboration in an interdisciplinary setting (Guillotel-Nothmann et al., 2022). Our Research Ecosystem approach builds on other approaches to formal description of research assets (such as FAIR Digital Objects, or RO-CRATE), but its essence is to address a middle level: above the elements of research but below the large-scale research objectives. We will show how the Polifonia Ecosystem evolved, and how the Ecosystem improves efficiency when tracing the quality and FAIRness of the research output and processes.
|