The study of language – its nature, its speakers, and the culture it creates – has always relied on access to a body of text, known as a corpus (pl. corpora). Ideally, a language corpus is large, balanced in terms of the authors who wrote it (their gender, location, age, and so forth), representative of various genres, and in our age also digitally encoded. With the rise of sophisticated Digital Humanities (DH) methods in recent years, the scientific community has raised the bar for corpora, such that they are expected to conform to accepted standards in their encoding, allow easy extraction of information about the corpus texts (e.g., in the form of metadata), offer visualization of corpus contents, and be freely accessible to the research community as well as to the public. This workshop contributes to filling this gap. By combining complementary expertise in Hebrew linguistics, corpus annotation, and Linked Open Data on the one hand (HUJI) and text encoding for drama, corpus visualization, and information extraction on the other hand (FUB), we propose to design and publish novel state-of-the-art resources for the study of language and literature in Hebrew and Yiddish. We bring together expert researchers from Freie Universität Berlin (FUB) and the Hebrew University of Jerusalem (HUJI) with active DH projects at the intersection of literature and linguistics.
- At FUB, the Drama Corpora Project (DraCor) is developed and maintained by Prof. Frank Fischer (FUB) and Prof. Peer Trilcke (Uni Potsdam), both affiliated researchers at the research area 5 within the cluster of excellence “EXC2020 Temporal Communities”. Here (DraCor) serves as a crucial infrastructure for digital research in literary and cultural studies across a wide range of interdisciplinary fields. Furthermore (DraCor) is part of the EU Horizon 2020-funded project “CLS-Infra” https://clsinfra.io/ as work package 7.
- At the Hebrew University of Jerusalem (HUJI), the Jerusalem Corpus of Emergent Modern Hebrew (JEMH): this project focuses on everyday Hebrew of the turn of the 20th century, with an aim to acquire linguistic insights into the process of language revival that modern Hebrew is famous for. The project outputs digitized archival materials in the form of annotated TEI files and an online user interface to search the corpus. In addition to ephemeral materials, letters, and correspondence, an important source of texts for this project has been satirical skits and the representation of everyday speech in plays. The JEMH Corpus can contribute Hebrew drama for inclusion in DraCor from Project Ben- Yehuda as well as digitized plays of the Matateh satirical theater preserved at the Israel Goor Theater Archives and Museum. Incorporating these materials into DraCor will lead in turn to standardization and sophistication of the annotation in JEMH and will enable research of the linguistic features of the speech represented in the plays. This project is led by Dr. Aynat Rubinstein and her research team.
- Also at the Hebrew University of Jerusalem (HUJI), Dr. Yael Netzer from the Center for Digital Humanities, participates and consults on several research projects such as the Matateh satirical theater - a project led by Dr. Shelly Zer-Zion from Haifa University, a theater which was active in Eretz Israel between the years 1928–1953. In addition, she acts to advance accessibility for best-practice corpora in Hebrew, and incorporating DraCor experience to annotate plays can be of great use to the development of various projects and to the literacy skills of its participants.