Skip to content

Fairscape Release RO-Crate Profile v0.1

Profile URI: https://w3id.org/fairscape/profile/0.1 Status: Stable

This document specifies the Fairscape Release RO-Crate Profile: the structural and semantic constraints that an RO-Crate must satisfy to be considered a Fairscape release. The profile is identified by the URI https://w3id.org/fairscape/profile/0.1 and is published as a W3C PROF-conformant Profile Crate.

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119 and RFC 8174.


1. Overview

A Fairscape Release Crate is an RO-Crate 1.2 packaging a versioned, AI-ready research dataset together with provenance, schema, and machine-learning-readiness metadata. The profile constrains and extends:

  • RO-Crate 1.2 — base packaging and metadata layout.
  • EVI Ontology — domain classes (Dataset, Software, MLModel, Computation, Annotation, Experiment, …) and properties.
  • PROV-Oprov:used, prov:wasGeneratedBy, prov:wasAttributedTo, etc.
  • Schema.org — core types (Person, Organization, Dataset) and properties (author, license, keywords, hasPart, …).
  • Croissant / Croissant-RAI 1.0 — machine-learning crosswalk emitted alongside each release.

The constituent artifacts of this profile are described in the Profile Crate (W3C PROF manifest) and in the Turtle profile manifest at fairscape_models/profiles/profile.ttl.


2. Conformance

A crate conforms to this profile if and only if all of the following are true:

  1. The crate’s ro-crate-metadata.json parses as valid JSON-LD per RO-Crate 1.2.
  2. The Root Data Entity carries a dct:conformsTo (conformsTo in JSON-LD shorthand) property whose value (or one element thereof) is {"@id": "https://w3id.org/fairscape/profile/0.1"}.
  3. The Root Data Entity’s @type list MUST include "Dataset" and "https://w3id.org/EVI#ROCrate".
  4. The Metadata Descriptor (the entity with @id: ro-crate-metadata.json) carries a conformsTo of {"@id": "https://w3id.org/ro/crate/1.2"}.
  5. Every required property listed in §4 below is present on its respective entity.

Conformance can be checked by the fairscape-cli rocrate validate command, which today performs Pydantic-based structural validation. The validator entry point is pinned at rocrate_commands.py:1290 (permalink to the v0.1 reference commit). Install instructions: https://fairscape.github.io/fairscape-cli/setup/.

2.1 Example conformance signal

{
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": { "@id": "https://w3id.org/ro/crate/1.2" },
"about": { "@id": "ark:99999/my-release" }
},
{
"@id": "ark:99999/my-release",
"@type": ["Dataset", "https://w3id.org/EVI#ROCrate"],
"conformsTo": { "@id": "https://w3id.org/fairscape/profile/0.1" },
"name": "",
"…": ""
}
]
}

3. Release file manifest

A conforming Fairscape release MAY be distributed as a directory or zip archive whose root contains the files listed below. The presence of each is normative as indicated.

FileCardinalityPurpose
ro-crate-metadata.jsonMUSTRO-Crate JSON-LD manifest (base RO-Crate 1.2 requirement).
ro-crate-preview.htmlMAYHuman-readable preview.
ro-crate-datasheet.htmlMUSTDatasheet-for-Datasets rendering of the release.
ro-crate-prov-graph.jsonSHOULDEvidence graph (EVI).
ro-crate-prov-graph.htmlSHOULDHTML visualization of the evidence graph.
ro-crate-croissant.jsonSHOULDCroissant / Croissant-RAI 1.0 export of dataset entities.
ro-crate-merkle-tree.jsonMAYSHA-256 Merkle tree for content integrity.
ro-crate-linkml.yamlMAYLinkML schema derived from the crate’s per-entity dataSchema declarations.

These cardinalities describe what fairscape-cli emits today and what consumers of a release can rely on. v0.1 does not enforce file presence in code; conformance to §2 is the binding requirement.


4. Required entity properties

Every property’s required/optional status below is sourced directly from the Pydantic modelsis_required() on each FieldInfo is the authoritative source. Each section lists the JSON-LD keys (aliases) as they appear in ro-crate-metadata.json. All entities additionally carry @id and @type, which are always required.

4.1 ROCrateMetadataElem (Root Data Entity)

The Root Data Entity has many optional Croissant-RAI and Datasheet-for-Datasets descriptors; only the required ones are listed here. The full schema is in schemas/ROCrateV1_2.json.

Required:

  • @type — list including "Dataset" and "https://w3id.org/EVI#ROCrate"
  • conformsTo — value must include {"@id": "https://w3id.org/fairscape/profile/0.1"}
  • name, description, keywords, version, hasPart, author, license

Optional: publisher, funder, identifier, rai:*, d4d:*, evi:*Count, … See schemas/ROCrateV1_2.json for the complete list with descriptions.

4.2 Dataset

  • Required: name, author, description, keywords, datePublished, format
  • Optional: version, contentUrl, dataSchema, generatedBy, derivedFrom, usedByComputation, md5, sha256, prov:*, …

4.3 Software

  • Required: name, author, description, format
  • Optional: version, contentUrl, usedByComputation, md5, sha256, dateModified, prov:*, …

4.4 MLModel

  • Required: name, author, description, format
  • Optional: version, modelTask, modelArchitecture, trainedOn, contentUrl, usedByComputation, md5, sha256, prov:*, …

4.5 Computation

  • Required: name, description, runBy, dateCreated
  • Optional: command, usedSoftware, usedMLModel, usedDataset, generated, prov:used, prov:wasAssociatedWith, …

4.6 Annotation

  • Required: name, description, createdBy, dateCreated
  • Optional: usedDataset, generated, prov:used, prov:wasAssociatedWith, …

4.7 Experiment

  • Required: name, description, experimentType, runBy, datePerformed
  • Optional: protocol, usedInstrument, usedSample, usedTreatment, usedStain, …

4.8 Schema

  • Required: name, description, properties
  • Optional: type, required, separator, header, examples, additionalProperties, …

4.9 Sample

  • Required: name, author, description, keywords
  • Optional: contentUrl, cellLineReference, isPartOf

4.10 Instrument

  • Required: name, manufacturer, model, description
  • Optional: usedByExperiment, associatedPublication, contentUrl, …

4.11 Patient

  • Required: name, sdPublisher, gender
  • Optional: diagnosis, drug, healthCondition, birthDate, deathDate

4.12 ModelCard

  • Required: name, author, description, version, keywords
  • Optional: modelType, framework, modelFormat, trainingDataset, parameters, inputSize, hasBias, intendedUseCase, baseModel, license, …

5. References