MMOL Tabular Data: User Guide

This user guide is intended to help you get started using the tabular versions of the Medieval Manuscripts in Oxford Libraries catalogue data, and to give you examples of what you can do with the data. Further details can be found on the following pages.

The Project

The processor and its resulting outputs were created as part of a project in 2025, funded by Digital Scholarship @ Oxford, led by Dr Matthew Holford ORCID iD, and assisted by Dr Sebastian Dows-Miller ORCID iD.

The aims of the project, using existing manuscript catalogue data stored in TEI files and available at the Medieval Manuscripts in Oxford Libraries (MMOL) GitHub repository, were as follows:

  1. To make the MMOL catalogue data more accessible to students and non-digital scholars for whom engaging directly with the TEI data would pose an insurmountable challenge.
  2. To facilitate analyses that would otherwise be difficult using the existing catalogue data, for example the cross-comparison of manuscripts across the whole catalogue.
  3. To produce a finding aid that can sit alongside the existing MMOL online catalogue to enable scholars to make new discoveries.
  4. To assist in the identification of errors and inconsistencies in the existing catalogue data.

The Data

The data are arranged to help you explore the outputs easily, and to enable the examination of research questions that would previously have been difficult to answer, such as:

  1. What texts appear alongside the works of Catullus in the manuscripts catalogued in MMOL?
  2. Are the leaf heights of 15th-century paper manuscripts in the catalogue more standardised than those of 15th-century manuscripts on parchment?
  3. What are the 10 most common script types found in manuscripts catalogued in MMOL?
  4. What are the IIIF manifest URIs for digitised manuscripts containing musical notation?
  5. What are the WikiData references for people referenced as authorsin the authority files?

Getting Started

This guide consists of the following sections, which are best consulted in order:

  1. The Data
  2. Worked examples using the spreadsheets
  3. Setting up the Column Merge app
  4. Worked examples using the Column Merge app
  5. Using the Processor
  6. Configuring the Processor

Before you get started, please read the following notes on the limitations on the quality and completeness of the data.

Notes on Data Quality

The Bodleian’s Western medieval manuscript catalogues, for which this processor is primarily designed, remains a work in progress both in the encoding into TEI-XML of data which have already been recorded, and in the recording of new data. The absence of information regarding e.g. decoration or musical notation should therefore not be taken as an indication that no such features exist for a given manuscript, only that these data have not been recorded in the digital catalogues.

Similarly, many of the datapoints presented through the outputs of this processor are the work of previous generations of librarians and scholars who had cataloguing priorities and practices that are different to those of modern cataloguers. In particular, in previous cataloguing cycles, measurements have often been rounded to the nearest 5mm or quarter/eighth of an inch, and features such as decoration have been described using assessments of aesthetic and material quality (e.g. “Good border”), rather than through dispassionate descriptions of their form and contents.

Users should therefore be aware of these limitations when engaging with the outputs of this processor, whether using the default outputs or custom ones.

Both the authority and collection files contain a file/tab on data quality, containing details of how many instances of a given datapoint appear in the outputs, and what proportion this constitutes of the total output. When this absolute or percentage figure is low, users should be wary of assuming that the feature in question is rare; it is also possible that it is just infrequently encoded.

For example, the relative scarcity of detail concerning script type in the TEI catalogue files (and therefore the tabular output) means that if there are only a few examples of one type in the output, this by no means indicates that these are the only such instances in the collections.

Download

To download the files, click on the links below:

Collections Data (Manuscripts) Authority Data (Organisations, People, Places, and Works)
XLSX (Excel) XLSX (Excel)
CSV CSV
JSON JSON

Click here to access the repository on GitHub, where you can download it in full.