RDF AP TG logo


Handbook: Table of Contents
Governing Board committees: Membership & FinanceNominations & Bylaws
Technical Board committees: UsageStandards & LiaisonsCommunity Specifications
Advisory Board committees: Conferences & MeetingsEducation & Outreach
Management: ExecutiveDirectorate


Contents


Deliverable 1

Report on the current state: use cases and validation requirements

Creator: Evelyn Dröge

Creator: Thomas Bosch

Creator: Karen Coyle

Creator: Antoine Isaac

Creator: Valentine Charles

Creator: Robina Clayphan

Creator: Mark Matienzo

Creator: Stefanie Rühle

Creator: Adrian Pohl

Creator: Miika Alonen

Creator: Lars Svensson

Date Issued: 2015-07-27

Identifier: http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable

Replaces: http://wiki.dublincore.org/index.php?title=RDF_Application_Profiles/UCR_Deliverable&oldid=9743

Is Replaced By: Not applicable

Latest Version: http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable

Status of Document: Draft

Description of Document: This document is the first deliverable of the "DCMI RDF Application Profiles Task Force" (DCMI RDF-AP). It reports on the current state of case stduies, use cases and validation requirements in July 2015 and replaces the first version of the document from October 2014.

Note: Validation requirements are described in the Report on validation requirements.

Introduction

This document describes the RDF Application Profile case studies of the "DCMI RDF Application Profiles Task Force" (DCMI RDF-AP) in July 2015. It replaces the use case document from October 2014. The DCMI RDF-AP aims at defining best practices for documenting application profiles, requests for handling RDF application profiles and for RDF constraints specification and validation. This deliverable focuses on reporting on the case studies collected in the task force, their use cases and their validation requirements (see Case Studies and Use Cases). The case studies of the task force are about the following models or representations (in alphabetical order): CEN EuroLMAI and DC reuse in academia, DINI AG KIM - RDF-Representation of Bibliographic Data, DM2E model, DPLA Metadata Application Profile version 3 (MAPv3), Europeana Data Model (EDM), EDM specialisation in the Deutsche Digitale Bibliothek (DDB), OER World Map prototype, RFC 6906 Profiles and Tiny Archive.

The full descriptions of case studies and use cases can be found in the task force wiki. Validation requirements and the corresponding use cases are collected in the DCMI RDF-AP database (see DCMI RDF Application Profiles database on case studies, use cases, requirements, and solutions). Validation requirements are described in Deliverable 4.

Case studies and use cases

The following case study descriptions have been created in the RDF-AP group:

  1. Europeana Data Model (EDM)
  2. Public Library of America (DPLA) MAPv3
  3. Deutsche Digitale Bibliothek (German Digital Library, DDB)
  4. DM2E model
  5. OER World Map prototype
  6. Using RFC 6906 Profiles
  7. DINI AG KIM - RDF-Representation of Bibliographic Data
  8. CSC - Reusing CEN EuroLMAI, DC In Academia
  9. Tiny Archive


A case study in the RDF-AP group refers to one model, application profile or any other specific way of reusing one or more vocabularies. A few case studies are divided into different validation use cases. These use cases are linked to general validation requirements from a requirement classification.

Europeana Data Model (EDM)

Europeana , the European digital library, provides access to more than 30 million library, archive, museum and audio-visual objects from 36 countries. The Europeana Data Model (EDM) was designed in order to insert diverse metadata about Cultural Heritage Objects (CHOs) into Europeana.

Full case study: Wiki and database

Similar case studies: DPLA, DDB, DM2E

Table 1: EDM use cases.
Use case Description
Data quality No html tags and no leading or trailing white spaces in the metadata.
Mandatory EDM classes Every cultural heritage object is described by using the classes edm:ProvidedCHO and ore:Aggregation.
Valid URIs Europeana requires for all the classes and properties requiring references to have valid URIs.
URI uniqueness edm:ProvidedCHO and ore:Aggregation instances need to have unique URIs across a dataset.
Dependency class-properties Some EDM classes imply the use of properties. If the class is used these properties should be used.
Property character chains match Some EDM properties requires specific values (these value can come from a vocabulary). For instance, check that edm:type matches the value TEXT, VIDEO, SOUND, IMAGE, 3D.
Mandatory properties The EDM schema lists a number of properties that are mandatory.
Mandatory properties with conditions EDM provides a list of mandatory properties per classes. However, there are some conditions on certain classes requiring the use of at least one of the mandatory properties.
Property refinement In EDM some properties have sub-properties which add more semantics to the information. If the general property is used more than once in the same records, the sub-properties should be used instead.
Dependency-reference-class In EDM, the use of a reference in some properties triggers the creation of a new class.
Property redundancy In EDM, some properties also have sub-properties that should be used to add more semantic to the data. If the super property and its sub-properties are present, there is a risk of redundancy that could affect the quality of the data.
Property value occurrence In EDM, some property should contain only one value/statement for data quality reasons.
Property comparison In some cases literals values need to be compared to be validated. It is especially relevant for properties capturing time based information.
Class relationship Some properties of EDM allow for the expression of relationships between entities. Most of the relationships are of type hierarchy, membership, sequence and therefore need to be validated.
Value validation across dataset Europeana manages its data per dataset. Each record from a dataset comes from the same provider. The value indicating the name of this provider should be the same across the whole dataset.
URI target validation Some EDM properties should be used in combination with a URI pointing to specific media. Some constraints might be defined for these medias (file type, size) .
Property syntax match Some EDM properties require specific values using a specific syntax, e.g. check if the value for date matches ISO 8601 (starting with the year and hyphenating the day and month parts: YYYY-MM-DD).
Ambiguous prefLabel values EDM re-uses SKOS for describing contextual resources. Specific rules apply to SKOS properties, e.g. ideally all skos:prefLabel should have a language tag.
Presence of language tag All EDM properties and SKOS properties requiring text value should ideally have a language tag.
Overlap in disjoint label properties skos:prefLabel, skos:altlabel and skos:hiddenLabel cannot have the same label.
Property-use-consistency Two concepts cannot be mapped to each other by using both skos:exactMatch and skos:broadMatch or skos:relatedMatch.
Recommended properties EDM recommends a set properties for improving the quality of the data.
Validation of EDM extensions Europeana needs to be able to validate data using EDM extensions against the general EDM schema and schemas for EDM extensions.
Property domain In EDM, some properties must be used only within a specific domain. It is especially valid for the properties with the class ore:Aggregation.

Digital Public Library of America (DPLA) MAPv3

The Digital Public Library of America (DPLA) provides access to digitized cultural heritage objects held by libraries, archives, museums, and historical societies throughout the United States. The DPLA Metadata Application Profile version 3 (MAPv3) builds on the Europeana Data Model.

Full case study: Wiki and database.

Similar case studies: EDM, DDB, DM2E

Table 2: DPLA MAPv3 use cases.
Use case Description
Validation following DPLA enrichment If a set of statements that has passed the mapping and enrichment pipeline does not comply with MAPv3, it can be prevented from being saved to the production store.
Validation using MAPv3 data supplied by partners A partner should be able to validate a set of statements to assure compliance against MAPv3 before submitting the data to DPLA for ingestion, using commonly deployed tools (i.e. not necessarily using identical infrastructure to DPLA). DPLA should be able to validate a set of statements for MAPv3 compliance.
Establishing mandatory and recommended levels of validation Some properties may not be mandatory, but may be recommended to indicate a "value-added" level of compliance with MAPv3.
Validation of MAPv3 data by API consumers Developers should be able to validate sets of statements against MAPv3 when trying to diagnose issues with their applications, using commonly deployed tools (i.e. not necessarily using identical infrastructure to DPLA).

Deutsche Digitale Bibliothek (German Digital Library, DDB)

The German Digital Library is a portal providing access to digital cultural and scientific objects from Germany. It is also a national contributor for Europeana. Therefore the DDB uses EDM as the model for the interlinking of objects in the DDB and the contribution of metadata to Europeana. But the use of EDM in the DDB requires an extension of properties used in EDM. Therefore - like DM2E - the DDB developed a specialization of EDM for the DDB domain.

Full case study: Wiki and database.

Similar case studies: EDM, DM2E, DPLA

Table 3: DDB use cases.
Use case Description
Uniqueness of URI Resources used in the DDB must be represented by unique URIs. Similar use case: UC-Europeana-3.
Mandatory EDM classes Every cultural heritage object is described by using the classes edm:ProvidedCHO and ore:Aggregation.
Wrong property domains Check if the (datatype or object) property is in the right domain.
Mandatory DDB terms In addition to the mandatory EDM terms in DDB more properties are mandatory, e.g., dcterms:rights and edm:provider to describe an ore:Aggregation
Domain dependent mandatory properties Some properties are mandatory depending on the class of their domain. E.g., the use of dc:type to describe an edm:WebResource is mandatory.
Domain dependent non-repeatable properties Depending on their domain some properties are not repeatable. E.g., dc:type must not be repeated if it is used to describe an edm:WebResource.
Property character chains match Some EDM properties requires specific values (these value can come from a vocabulary). For instance, check that edm:type matches the value TEXT, VIDEO, SOUND, IMAGE, 3D.
Dependency-reference-class In EDM, the use of a reference in some properties triggers the creation of a new class.
Object property range Check if the object property range is correct, e.g. if dm2e:painter has range foaf:Person.
Domain dependent value of the property Depending on their domain some properties have to be used with URIs of controlled vocabularies. E.g., if dc:type is used to describe an edm:WebResource its value has to be a URI from the DDB media type vocabulary
Value dependency of property obligation The obligation of a property depends on the value of another property. E.g., if dc:type is used with the value "Chapter", the use of dcterms:isPartOf is mandatory.
Value dependency of value use The value allowed with a property depends on the value of another property, e.g., if ddb:hierarchyType is used with the value "htype_006", the value of ddb:aggregationEntity has to be "false".
Mandatory properties The EDM schema lists a number of properties that are mandatory.
Rules from reused vocabularies Application Profiles like the DM2E model or DDB's EDM Profile must fulfill the requirements of the underlying vocabularies. E.g., mandatory properties in EDM must also be mandatory in DM2E and DDB.

DM2E model

The DM2E model is a specialisation of the EDM for manuscripts. It has been used for mappings to Europeana and for publishing delivered metadata as LOD via the DM2E LOD API in the scope of the Digitised manuscripts to Europeana (DM2E) project.

Full case study: Wiki and database.

Similar case studies: EDM, DDB, DPLA

Table 4: DM2E model use cases.
Use case Description
Mandatory EDM classes Every cultural heritage object is described by using the classes edm:ProvidedCHO and ore:Aggregation.
Mandatory DM2E properties In addition to the mandatory EDM elements, the DM2E model requires some more properties. E.g., dc:type with the range of subclasses of edm:PhysicalThing or skos:Concept for edm:ProvidedCHO.
Recommended DM2E properties Recommended properties in the DM2E model are, for example, edm:object and dm2e:hasAnnotatableVersionAt in ore:Aggregation.
Non-repeatable DM2E properties The DM2E model has many non-repeatable properties, e.g. edm:currentLocation and dm2e:holdingInstitution for the provided object (edm:ProvidedCHO).
Wrong MIME types If you provide an edm:object it is mandatory to add an allowed MIME type with dc:format to the instance of edm:WebResource. Allowed MIME types are, for example, image/png or image/jpeg.
Wrong datatypes Check whether a datatype property is used with the right datatype, e.g. the recommended use of bibo:numPages in DM2E is xsd:unsignedInt, the value of dm2e:displayLevel must be xsd:boolean.
Wrong property domains Check if the (datatype or object) property is in the right domain.
Object property range Check if the object property range is correct, e.g. if dm2e:painter has range foaf:Person.
Illegal URI characters The use of RFC 3986 compliant URIs is obligatory and verified by generating FATAL errors for invalid URIs at validation time.
Recommended language tags The DM2E model recommends using language tags and following the language representation standards RFC3066 and RFC5646 (use an ISO-639-1 2-character ; if it not exists, use a ISO-639-2 3-character code). Language tags should be used whenever language information is available, but they are not mandatory. Similar use case: UC-Europeana-18.
Unsupported properties in DM2E Properties or classes that are not part of the model are not accepted in the ingestion because they cannot be displayed in the Europeana portal or the DM2E search engines.
Dependency class-properties Some EDM classes imply the use of properties. If the class is used these properties should be used.
Property character chains match Some EDM properties require specific values (these value can come from a vocabulary). For instance, check that edm:type matches the value TEXT, VIDEO, SOUND, IMAGE, 3D.
Uniqueness of URI Resources used in the DDB must be represented by unique URIs. Similar use case: UC-Europeana-3.
Domain dependent mandatory properties Some properties are mandatory depending on the class of their domain. E.g., the use of dc:type to describe an edm:WebResource is mandatory.
Domain dependent value of the property Depending on their domain some properties have to be used with URIs of controlled vocabularies. E.g., if dc:type is used to describe an edm:WebResource its value has to be a URI from the DDB media type vocabulary
URI target validation Some EDM properties should be used in combination with URI pointing to specific media. Some constraints might be defined for these medias (file type, size) .
Mandatory properties The EDM schema lists a number of properties that are mandatory.
Rules from reused vocabularies Application Profiles like the DM2E model or DDB's EDM Profile must fulfill the requirements of the underlying vocabularies. E.g., mandatory properties in EDM must also be mandatory in DM2E and DDB.

OER World Map prototype

The hbz has built a prototype for an Open Educational Resources (OER) world map. The data of the OER map prototype covers descriptions of organizations, persons, services and projects.

For the OER world map a read/write API was developed as well as an application (Drupal) for viewing, adding, deleting and editing the data (see the image of the architecture). Both, API and application, need some specifications which should be expressed in an RDF application profile. While the API mainly needs information for validating incoming data, the application also needs information about presenting the data (result lists, single results, forms for editing the different resources).

Full case study: Wiki and database.

Table 5: OER World Map prototype use case.
Use case Description
General AP requirements High level requirements for RDF application profiles.
Requirements for validation Identify element/class and its relation to AP; define object restrictions for properties; cardinality, X(OR) rules.
Requirements for building a web application Requirements for presenting the data as HTML/RDFa and generating a web form for creating/editing/linking the data.

Using RFC 6906 Profiles

The case study is not about representing profiles, but rather about using them. Precisely, this case study does not depend on the representation of the profile, but on the profile having a URI for identification. In order to keep up with best practices, that URI SHOULD dereference to a representation, possibly using content negotiation to deliver human- and machine-readable interpretations.

Full case study: Wiki and database.

Table 6: RFC 6906 Profiles use case.
Use case Description
Dependency-reference-class In EDM, the use of a reference in some properties triggers the creation of a new class.

DINI AG KIM - RDF-Representation of Bibliographic Data

More and more members of the library network in German speaking countries publish their data as open "linked" data using different properties and classes and have different ideas what linked data really means. To harmonize this work some colleagues from library service centers, National Libraries etc. in Germany, Austria and Switzerland are working on a best practice guide.

If somebody wants to publish his data in compliance with the recommendation of the DINI AG KIM he starts with mapping his data to properties and classes listed in the best practice guide.

  • First he needs a human readable documentation of the properties and classes he should use also describing the constraints concerning obligation, occurence and possible values of the properties.
  • For the technical mapping he could use a "template" describing what is needed in a machine-readable way.
  • During the mapping process he wants to automatically check the output for its compliance with the recommended properties, classes and constraints and needs a "schema" to do so.

Full case study: Wiki and database.

Table 7: DINI AG KIM use case.
Use case Description
Use case for requirements concerning the AP on a meta-level This Use Case is a container for requirements that do not concern the validation of metadata supported by an Application Profile but other use cases like documentation of an Application Profile, presentation of the metadata or wether terms (properties and classes) in the Application Profile are used compliant to the vocabularies they are taken from.

CSC - Reusing CEN EuroLMAI, DC in Academia

CSC is developing and maintaining national data model for the academia in the field of higher education and science. The use case for machine-readable application profile would be to support the whole lifecycle of development:

  1. Development of application profiles using “semantic templates” which documents the vocabulary for both human and machine. Mix&Match classes and properties from appropriate vocabularies. Create extensions to existing application profiles, for example CEN MLO.
  2. Validate the proper use of “semantic templates” (Validation of Application Profile)
  3. Provide provenance information about lifecycle of the development process
  4. Validate the instances against the developed application profile (Validation of Data)

Full case study: Wiki and database.

Table 8: CSC use cases.
Use case Description
User creates new application profile User creates new application profile, including name, description, administrative information.
Define used vocabularies In order to define reused classes and properties in the application profile user must first define used vocabularies.
Define class usage New class usage is defined and added to the application profile.
Define property usage A user can add property usages directly to the application profile and relate one property usage to many class usages.
Visualize class usage with class diagrams Class usage is visualized automatically as class diagrams from machine readable RDF AP format.
Export and import application profiles Application profiles can be exported and imported from different RDF formats.
Reusing existing application profiles Machine readable application profiles can also be reused and extended in the profile editor by importing contents of an application profile to another profile.
Localize application profile Application profiles can also be localized to different languages by translating literals, such as defined by title, description, comment, etc.
Language constraints Datatype property usages that have plainLiteral as defined range, may be constrained to certain languages.
SKOS Scheme usage SKOS Concepts and Schemes can be used as fixed classifications for RDF data. It should be possible to state that some object property can only point to certain objects defined in some SKOS scheme.

Tiny Archive

Tiny archive has less than 10,000 total items. About 2/3 of these have been described in a spreadsheet, and each item is represented on the web as a web page. The descriptions contain commentary about their items in different languages. The archive wants to begin linking to other archives in its topic area, as well as to a database of terms that is now available in RDF.

Full case study: Wiki and database.

Table 9: Tiny archive use case.
Use case Description
Human readable documentation of properties Provide a place to include human readable documentation of properties and what they mean, e.g. "title - the title of the resource being described".

Database on case studies, use cases, requirements, and solutions regarding RDF constraint formulation and RDF data validation

We initiated a collaboratively curated database on case studies, use cases, requirements, and solutions regarding RDF constraint formulation and RDF data validation. This database contains the findings of the DCMI RDF Application Profiles Task Group based on various case studies provided by data institutions and the findings resulting from the collaboration with the W3C RDF Data Shapes Working Group. The database, which is publicly and online available via http://purl.org/net/rdf-application-profiles, is open for further contributions and connects requirements to use cases, case studies, and solutions.

The idea of the extensible database is

  1. to collect and describe case studies from experts (from theory and practice dealing with RDF validation problems) and the general public,
  2. to extract common use cases from these case studies that illustrate particular problems,
  3. to specify requirements to be fulfilled in order to adequately solve these problems and meet the use cases,
  4. to investigate existing best-practices regarding these requirements, and
  5. to evaluate existing approaches / tools to which extend specific requirements are fulfilled.

Initial sources for this database have been (1) the 2013 W3C RDF Validation Workshop, (2) mailing list discussions from the W3C RDF Data Shapes Working Group, (3) the 2013 Semantic Web in Libraries conference, 4) discussions in the RDF Application Profiles Task Group, and (5) diverse research papers.

In the development of standards, as in software, case studies and/or use cases are usually taken as starting point. In case studies, the full background of a specific scenario is described, where the standard or the software is to be applied. Use cases are smaller units where a certain action or a typical user enquiry is described. They can be extracted from and thus linked to case studies, but often they are defined directly.

Requirements are extracted from use cases; they form the basis for development and are used to test the result. We specifically use the requirements to evaluate existing approaches for constraint formulation and validation. Via the requirements, the approaches get linked to use cases and case studies and it becomes visible which approaches can be used in a given scenario and what drawbacks might be faced.

We classify the requirements to provide a high-level view on different approaches and to facilitate a better understanding of the problem domain. Our database is openly available and can be extended with new case studies, use cases, requirements and approaches.

DCMI case studies, use cases, and requirements

When we started capturing RDF validation requirements from the W3C RDF Validation workshop in 2013, the intention and vision was to have one system which could be used to manage all requirements as well as associated use cases, case studies, and tools which are relevant for different domains.

In order to support different views on requirements, use cases, and case studies for multiple domains like the DCMI, views are introduced:

Case studies may be tagged as DCMI case studies. The database contains a dedicated view (http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=case-studies/dc-case-studies) for these DCMI case studies which lists only the case studies which are classified as being DCMI case studies. Furthermore, case studies may be related to other case studies.

Case studies may be related to use cases and use cases may be connected with case studies. In case a DCMI case study is connected with a use case, this use case is automatically classified as DCMI use case as well. As a consequence, this use case is also shown in a dedicated view for all DCMI use cases (http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=use-cases/dc-use-cases).

Requirements may have relationships to use cases and use cases to requirements. If DCMI use cases are related to a requirement, this requirements is automatically considered as a DCMI requirement and listed in the DCMI requirements view (http://lelystad.informatik.uni-mannheim.de/rdf-validation/?q=requirements/dc-requirements). Requirements are uniquely identified in the database by an R and a number.

As a consequence each domain can browse the requirements, use cases, and case studies the domain is interested in, but also requirements, use cases, and case studies from other domains.

Research papers

Bosch, T. & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257.

In this paper, we present the database of requirements obtained from various sources, including the use cases presented at the workshop as well as in the RDF-AP WG. The database, which is openly available and extendible, is used to evaluate and compare several existing approaches for constraint formulation and validation. We present a classification and analysis of the requirements, show that none of the approaches satisfy all requirements and aim at laying the ground for future work, as well as fostering discussions how to close existing gaps.

Bosch, T., Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933.

Based on our work in the DCMI and in cooperation with the W3C working group, we published by today 81 requirements to validate RDF data and to formulate RDF constraints that are required by various stakeholders for data applications. Each of these requirements corresponds to a constraint type from which concrete constraints are instantiated to be checked on RDF data. Requirements are uniquely identified in the database by an R and a number. In this technical report, we explain each requirement/constraint type in detail and give examples for each represented by different constraint languages.

Sustainability of the database and technical details

The database is currently hosted at the University of Mannheim. It is implemented using a Drupal instance and a mysql database in the background. The full source code of the underlying Drupal instance and the database with the current state of all case studies, use cases, and requirements is available at https://github.com/kaiec/reqbase .

A backup of the mysql database is done automatically once a day and the last 10 backups are kept on the server. There is also a monthly separate backup. With the current setting and daily backups, we should not loose the database and in case of a server failure, we should be able to set it up again.

For long term preservation after work on the database has ended, we suggest to use some export format or a static HTML dump. Such static data can be hosted by DCMI. We provide multiple export functionalities like exports to XLS, CSV, XML, TXT, and DOC which can be used for archival purposes.

The actual location of the database at lelystad.informatik.uni-mannheim.de is not very sustainable, only the purl.org URI http://purl.org/net/rdf-application-profiles should be used to reference the database.

The URIs of the database entries are not ideal, we suggest to not use them as URIs but always refer to the database via the purl.org URI and to the entries via the numbering scheme (UC-14 or UC-14-RECOMMENDED-LANGUAGE-TAGS).

The maintenance of the database as work environment is primarily done by Thomas Bosch with support by Kai Eckert.

The actual database dumps are also stored and can be made available publicly. It only requires an additional step to erase all user information and particularly the passwords as you don't want to have this in a public dump. But with this, you are able to setup your own Drupal system and continue to work with the database somewhere else or even locally on your laptop.


Conclusion

The deliverable gave insight into case studies regarding RDF constraint formulation, RDF data validation, and the formulation of RDF AP, associated use cases, as well as the DCMI RDF-AP database of the DCMI RDF-AP Task Group as of today (July 2015). The database, which is publicly and online available at http://purl.org/net/rdf-application-profiles, is open for further contributions and connects requirements to use cases, case studies, and solutions.

Different case studies have been collected by various stakeholders for data applications, whose use cases are connected to requirements from the DCMI RDF-AP database. It could be seen that especially case studies which build on the same data model (e.g., EDM) have some use cases and requirements in common. Still, every case study provides some unique use cases. All requirements that occur in the task group's case studies (DC requirements) are collected in the Requirements Deliverable.