Challenges
With this model for research datasets, we face the challenges of making them publishable and discoverable. Distributed annotation platforms have been implemented on the open web with various degrees of success. We hope that the structure of the IIIF standard would allow us to build on existing tools, e.g., for displaying annotations overlayed on a base document, and focus on these missing components.
Challenge ATo make data publishable, we would need to develop a tool takes as input one of the two kinds of data mentioned above: either (1) lists of passages selected in IIIF documents (canvases), with attached annotations; or (2) transcriptions of the textual content of a IIIF canvases, produced manually or by OCR/HTR. The publishing tool would generate IIIF collections and/or manifests, as appropriate, and post them to some sort of repository—e.g., institutional repository, Zenodo, GitHub, etc. (We'd need some solution that gives us an addressable URL to a JSON document.)
Challenge BTo make data discoverable, we need to support viewing the annotations or transcription overlayed on the base IIIF document. For viewing, we could build on existing IIIF clients' abilities to show collections and to display images with overlaid annotations. We would also need to support searching and browing for documents that have such overlays. One conceptually simple way to do this would be to implement a browser plugin that, when the user is browing one of the base IIIF documents, alerts the user to the presence of outside annotations or transcriptions. This approach requires users to install a component on their client, which might limit adoption. On the other hand, it would depend on existing IIIF repositories' searching and browsing capabilities. To complement that, we could build separate searching and browsing services on top of a repository of published annotations and transcriptions.
Participants based in libraries have a lot of experience with challenges related to reuse of their digital collections. Digital humanities participants may have other kinds of collections of research data they wish to publish. We should prioritize these challenges depending on the use cases of the workshop participants.