Data Structure
The data outputs of this project are intended to be used by researchers of all levels, from those with no knowledge of the underlying TEI-XML data, through to experienced users who nevertheless want to access the data in a format that enables cross-comparison.
You do not need to be able to run the processor in order to access the outputs. They can be found in the tabular_data/output
folder, where there are two kinds:
- Collection: these are the data which have been extracted from the collection files, and so predominantly contain information about manuscripts and manuscript parts.
- Authority: these are the data which have been extracted from the authority files which accompany the collection, and so predominantly contain information about people, places, and organisations associated with the collections.
Download
The data are presented in three formats:
- A single XLSX (Excel) file, containing tabs for each of the configured outputs. This is the recommended format, as the data in each tab are grouped into sections, with accompanying notes explaining the content of each column.
- A set of CSV files, each one corresponding to a tab in the XLSX output.
- A set of JSON files, each one corresponding to a tab in the XLSX output.
To download the files, click on the links below:
Collections Data (Manuscripts) | Authority Data (Organisations, People, Places, and Works) |
---|---|
XLSX (Excel) | XLSX (Excel) |
CSV | CSV |
JSON | JSON |
Click here to access the repository on GitHub, where you can download it in full.
Collection
Details of the default contents of each tab of the Excel files (or individual CSV/JSON files) can be found below.
00_overview
Each row of the output file corresponds to a manuscript unit, be that a part (msPart
) or a whole MS (msDesc
). The output file contains a variety of basic information about the unit in question, including details of its origin, provenance, acquisition, contents, codicology, palaeography, and binding, and a number of measurements.
01_records
Each row of the output file corresponds to an individual additional
element within the collection data, and contains enhanced metadata about the object in question, as well as details of the record history, the availability of the object and any digital surrogates, and printed and digital resources associated with the object.
02_binding
Each row of the output file corresponds to an individual binding
element or a dimensions
element with @type="binding"
. This contains overview information about the object’s binding, as well as dating and measurement information.
03_measurements
Each row of the output file corresponds to an individual set of measurements for a manuscript unit, as encoded within either (or both) of layout
and extent
elements. This contains information about the manuscript part’s extent, as well as measurements and other details associated with rolls, fragments, leaves, columns, intercolumn widths, the written area, and ruling.
04_hands
Each row of the output file corresponds to an individual handNote
element for a manuscript unit. This contains overview information about the hand in question, as well as details of any people associated with the hand, any locus information, details of dating, and palaeographic details.
05_contents
Each row of the output file corresponds to an individual work or part of a work contained within a manuscript unit in the collection. This contains locus and length information, alongside details of the work’s title, subject, author, language, and paratext, as well as any bibliographic references associated with it in the collection file.
06_music
Each row of the output file corresponds to an individual description of musical notation encoded within a musicNotation
element. This contains overview information about the notation in question, as well as locus and dimension information, and details of any people or bibliographic references associated with the notation.
07_decoration
Each row of the output file corresponds to an individual description of decoration encoded within a decoNote
element. This contains overview information about the decoration in question, including its type, as well as locus and date information and details of any people, organisations, places, or bibliographic references associated with the decoration.
08_origins
Each row of the output file corresponds to an individual manuscript unit, and the information about its origins contained within the origin
element (manuscript parts inherit data from their containing manuscript, should no other information be present). This contains overview information about the origin of the unit in question, as well as locus and date information, and details of any people, organisations, or bibliographic references associated with the manuscript unit’s origins.
09_additions
Each row of the output file corresponds to an individual description of additions to a given manuscript unit, as given within the additions
element. This contains overview information about the addition in question, as well as locus information, and details of any people or bibliographic references associated with the addition.
10_provenance_acquisition
Each row of the output file corresponds to an individual set of information given about the history of a manuscript unit, as contained within a provenance
or acquisition
element. This contains overview information about the historical detail in question, as well as date information, and details of any people, organisations, or places associated with the historical detail, with additional details given about former owners.
Authority
Details of the default contents of each tab of the Excel files (or individual CSV/JSON files) can be found below.
orgs
Each row of the output file corresponds to an organisation named within the authority file places.xml
.
persons
Each row of the output file corresponds to a person named within the authority file persons.xml
.
places
Each row of the output file corresponds to a place named within the authority file places.xml
.
works
Each row of the output file corresponds to a work named within the authority file works.xml
.