DPLA RDF application profile use cases
Overview
DPLA maintains an access portal to digitized cultural heritage objects held by libraries, archives, museums, and historical societies throughout the United States, and provides bulk and programmatic access to this data. The DPLA Metadata Application Profile version 3 (MAPv3) builds on the Europeana Data Model. As such, our use case is somewhat similar to the EDM and the DDB-EDM use cases.
We harvest data using several different methods (file transfer, OAI-PMH, site-specific APIs, etc.) and process data in different formats (MODS, MARCXML, qualified and unqualified DC, and site-specific serializations). DPLA augments and normalizes data received from partners (content hubs and service hubs) as an enrichment pipeline that is part of our ingestion process. While MAPv3 builds on EDM, we currently use JSON-LD as our sole storage and serialization format. Currently, our primary concern relates to the need to check cardinality constraints and occurrences and (in limited cases) checking value types.
DPLA Use Case 1: Validation following DPLA enrichment
- At this time, all partners provide us with data that requires mapping and enrichment.
- After fetching the records we pass them through the mapping and enrichment pipeline.
- At the end of the pipeline, we should have a set of statements that complies with MAPv3.
- If that set of statements does not comply with MAPv3, we can prevent the record from being saved to the production store or otherwise indicate that additional remediation is necessary (e.g. generating a view on our document store that checks for a value of a specific key).
- In addition, we can provide feedback to the partner regarding data quality, or identify edge-case errors in our mapping and enrichment pipeline.
DPLA Use Case 2: Validation Using MAPv3 data supplied by partners
- Some partners have expressed interest in developing their own enrichment workflows and have potentially offered to provide us with MAPv3 compliant data.
- A partner should be able to validate a set of statements to assure compliance against MAPv3 before submitting the data to DPLA for ingestion, using commonly deployed tools (i.e. not necessarily using identical infrastructure to DPLA).
- DPLA should be able to validate a set of statements for MAPv3 compliance.
DPLA Use Case 3: Establishing mandatory and recommended levels of validation
- Some properties may not be mandatory, but may be recommended to indicate a "value-added" level of compliance with MAPv3.
- For example, we may require that a subject statement have an rdf:Label or a skos:prefLabel, but we may want to check for the presence of an identifier for that term
DPLA Use Case 4: Validation of MAPv3 data by API consumers
- Developers should be able to validate sets of statements against MAPv3 when trying to diagnose issues with their applications, using commonly deployed tools (i.e. not necessarily using identical infrastructure to DPLA).
Preliminary Investigations
One option that we have been investigating recently is using JSON Schema, which is less than ideal since it takes a document-centric approach and is limited to a specific serialization.
- Sample code in context of DPLA ingestion code
- Endpoint allows a client to POST a MAPv3/JSON-LD instance document. If valid/invalid, sets a True/False value on admin.valid_after_enrich property.