Skip to content

Datasheet Property Reference

The fairscape-cli build datasheet command generates an HTML datasheet from your RO-Crate metadata. This document lists every RO-Crate property that populates the datasheet, organized by section.

All properties are read from the root entity of the RO-Crate (the ROCrateMetadataElem).


Overview Section

These properties populate the main overview panel of the datasheet.

Datasheet Field RO-Crate Property
Title name
Description description
Identifier @id
DOI identifier
License license
Ethical Review ethicalReview
Release Date datePublished
Created Date dateCreated
Updated Date dateModified
Authors author
Publisher publisher
Principal Investigator principalInvestigator
Contact Email contactEmail
Copyright copyrightNotice
Terms of Use conditionsOfAccess
Confidentiality Level confidentialityLevel
Citation citation
Version version
Content Size contentSize
Funding funder
Keywords keywords
Related Publications associatedPublication
Completeness completeness
Human Subject humanSubjects
Human Subject Research humanSubjectResearch
Human Subject Exemptions humanSubjectExemption
De-identified Samples deidentified
FDA Regulated fdaRegulated
IRB irb
IRB Protocol ID irbProtocolId
Data Governance dataGovernanceCommittee

Property Details

Title — name

Human-readable name for the dataset.

Description — description

Human-readable description of the dataset.

Identifier — @id

Persistent unique identifier (ARK, URL, etc.).

Used in: AI-Ready Findability and Sustainability scoring.

DOI — identifier

DOI or other external persistent identifier.

Used in: AI-Ready Findability and Sustainability scoring.

License — license

Link to or name of the dataset license.

Example: https://creativecommons.org/licenses/by/4.0/

Used in: Distribution section, AI-Ready Fairness and Ethics scoring.

Ethical Review — ethicalReview

Were any ethical or compliance review processes conducted (e.g. by an IRB)?

Describe the process, frequency of review, and outcomes.

Used in: AI-Ready Ethics scoring.

Release Date — datePublished

Date the dataset was published or made publicly available (ISO 8601).

Used in: Distribution section.

Created Date — dateCreated

Date the dataset was originally created (ISO 8601).

Updated Date — dateModified

Date the dataset was last modified (ISO 8601).

Authors — author

Who created the dataset — e.g. which team, research group, on behalf of which institution.

Format: Parsed as list (comma or semicolon separated).

Publisher — publisher

Organization or person responsible for publishing or distributing the dataset.

Used in: Distribution section, AI-Ready Computability scoring.

Principal Investigator — principalInvestigator

A key individual (PI) responsible for or overseeing dataset creation.

Used in: AI-Ready Provenance scoring.

Contact Email — contactEmail

Email address for questions or correspondence about the dataset.

Copyright — copyrightNotice

Copyright statement including year and rights holder.

Example: Copyright (c) 2024 by The Regents of the University of California

Terms of Use — conditionsOfAccess

Terms and conditions governing access to and use of this dataset.

Includes any data use agreements required.

Confidentiality Level — confidentialityLevel

Access restriction classification.

Values: unrestricted, restricted, or confidential.

Used in: AI-Ready Ethics scoring.

Citation — citation

Preferred citation string for this dataset.

Version — version

Version string for this release (e.g. 1.0, 2.3.1).

Used in: Distribution section.

Content Size — contentSize

Total size of the dataset content (e.g. 2.4 GB).

Used in: AI-Ready Characterization scoring.

Funding — funder

Who funded the creation of the dataset? Include grant names and numbers.

Format: Parsed as list.

Keywords — keywords

Keywords or tags describing the dataset, used for discovery and search.

Related Publications — associatedPublication

Publication(s) associated with or describing this dataset.

Format: Parsed as list.

Completeness — completeness

Assessment of how complete the dataset is relative to its intended scope — e.g. percentage of expected records present, known gaps.

Fallback: additionalProperty with name "Completeness".

Human Subject — humanSubjects

Does this dataset involve human subjects? Indicate Yes/No and describe the nature of involvement.

Fallback: additionalProperty with name "Human Subject".

Used in: AI-Ready Ethics scoring.

Human Subject Research — humanSubjectResearch

Broader context for human subjects research involvement.

Covers regulatory frameworks followed (e.g. 45 CFR 46, HIPAA).

Fallback: additionalProperty with name "Human Subject Research".

Human Subject Exemptions — humanSubjectExemption

Applicable exemption category if human subjects research is exempt from full IRB review.

Example: 45 CFR 46 Exemption 4.

Fallback: additionalProperty with name "Human Subjects Exemptions".

De-identified Samples — deidentified

Whether the dataset has been de-identified to remove or obscure PII.

Boolean converted to Yes/No.

Fallback: additionalProperty with name "De-identified Samples".

FDA Regulated — fdaRegulated

Whether this dataset is subject to FDA regulations — e.g. clinical trial data, medical device data.

Boolean converted to Yes/No.

Fallback: additionalProperty with name "FDA Regulated".

IRB — irb

Institutional Review Board (IRB) information — covers approval status, approving institution, and contact details.

Accepts a plain string or a structured IRB object.

Fallback: additionalProperty with name "IRB".

IRB Protocol ID — irbProtocolId

IRB protocol identifier number assigned by the reviewing institution.

Fallback: additionalProperty with name "IRB Protocol ID".

Data Governance — dataGovernanceCommittee

Name or contact for the data governance committee — responsible for oversight, access control, and policy enforcement.

Fallback: additionalProperty with name "Data Governance Committee".

Used in: AI-Ready Ethics and Sustainability scoring.

additionalProperty Fallbacks

Several human-subjects and governance fields support a fallback mechanism. If the top-level property is not set, the converter checks the additionalProperty array for a matching name entry:

{
  "additionalProperty": [
    {"name": "Human Subject", "value": "Yes"},
    {"name": "Data Governance Committee", "value": "Bridge2AI Ethics Committee"}
  ]
}

Use Cases Section

These properties populate the responsible AI (RAI) and data documentation panel. RAI properties conform to the Croissant RAI 1.0 specification. A dataset can declare conformance by setting dct:conformsTo: "http://mlcommons.org/croissant/RAI/1.0" at the root entity level.

Datasheet Field RO-Crate Property Croissant RAI Use Case
Intended Use rai:dataUseCases AI safety and fairness, Compliance
Limitations rai:dataLimitations AI safety and fairness
Prohibited Uses prohibitedUses Compliance
Potential Sources of Bias rai:dataBiases AI safety and fairness
Maintenance Plan rai:dataReleaseMaintenancePlan Compliance, Data life cycle
Data Collection rai:dataCollection Data life cycle
Data Collection Type rai:dataCollectionType Data life cycle
Data Collection Missing Data rai:dataCollectionMissingData Data life cycle
Data Collection Raw Data rai:dataCollectionRawData Data life cycle
Data Collection Timeframe rai:dataCollectionTimeframe Data life cycle
Data Imputation Protocol rai:dataImputationProtocol Compliance
Data Manipulation Protocol rai:dataManipulationProtocol Compliance
Data Preprocessing Protocol rai:dataPreprocessingProtocol Data life cycle
Data Annotation Protocol rai:dataAnnotationProtocol Data labeling, Compliance
Data Annotation Platform rai:dataAnnotationPlatform Data labeling, Participatory data
Data Annotation Analysis rai:dataAnnotationAnalysis Data labeling, Compliance
Personal/Sensitive Information rai:personalSensitiveInformation Compliance, AI safety and fairness
Data Social Impact rai:dataSocialImpact AI safety and fairness
Annotations Per Item rai:annotationsPerItem Data labeling
Annotator Demographics rai:annotatorDemographics Data labeling, Participatory data, Inclusion
Machine Annotation Tools rai:machineAnnotationTools Data labeling

Property Details

Intended Use — rai:dataUseCases

Explicit statement of intended uses.

Values: Training, Testing, Validation, Development or Production Use, Fine Tuning, others.

Include usage guidelines and caveats.

Used in: AI-Ready Pre-Model Explainability scoring.

Limitations — rai:dataLimitations

Known limitations of the dataset.

Covers data generalization limits (e.g. related to data distribution or quality) and non-recommended uses. Distinct from biases (systematic errors) and anomalies (data quality issues).

Used in: AI-Ready Pre-Model Explainability scoring.

Prohibited Uses — prohibitedUses

Explicit statement of uses not permitted by license, ethics, or policy. Stronger than discouraged uses.

Fallback: additionalProperty with name "Prohibited Uses".

Used in: AI-Ready Ethics scoring.

Potential Sources of Bias — rai:dataBiases

Known biases in the dataset — systematic errors or prejudices that may affect representativeness or fairness.

Distinct from anomalies (data quality issues) and limitations (scope constraints).

Used in: AI-Ready Characterization scoring.

Maintenance Plan — rai:dataReleaseMaintenancePlan

Will the dataset be updated? If so, how often and by whom?

Cover versioning timeframe, maintainers, how updates are communicated, and deprecation policies.

Used in: AI-Ready Sustainability scoring.

Data Collection — rai:dataCollection

What mechanisms or procedures were used to collect the data?

Examples: Hardware sensors, manual curation, software APIs. Also covers how these mechanisms were validated.

Used in: AI-Ready Ethics scoring.

Data Collection Type — rai:dataCollectionType

List of collection type(s), joined to string.

Recommended values: Surveys, Secondary Data Analysis, Physical Data Collection, Direct Measurement, Document Analysis, Manual Human Curator, Software Collection, Experiments, Web Scraping, Web API, Focus Groups, Self-Reporting, Customer Feedback Data, User-Generated Content Data, Passive Data Collection, Others.

Data Collection Missing Data — rai:dataCollectionMissingData

Document missing data patterns and handling strategies.

Cover pattern types (MCAR, MAR, MNAR), known or suspected causes (e.g. sensor failures, participant dropout, privacy constraints), and strategies used to handle missing values.

Used in: AI-Ready Characterization scoring.

Data Collection Raw Data — rai:dataCollectionRawData

Description of raw data sources before preprocessing or labeling.

Document where the original data comes from and how it can be accessed.

Data Collection Timeframe — rai:dataCollectionTimeframe

Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the underlying data?

Provide start and end dates where possible. List joined to string.

Data Imputation Protocol — rai:dataImputationProtocol

Describe data imputation methodology.

Cover techniques used (e.g. mean/median imputation, forward fill, model-based imputation) and rationale for chosen approaches.

Data Manipulation Protocol — rai:dataManipulationProtocol

Was any cleaning done? Describe the procedures applied.

Examples: Removal of instances, processing of missing values, deduplication, filtering.

Data Preprocessing Protocol — rai:dataPreprocessingProtocol

Was any preprocessing done? Describe steps to make the data ML-ready.

Examples: Discretization or bucketing, tokenization, feature extraction, normalization. List joined to string.

Data Annotation Protocol — rai:dataAnnotationProtocol

Annotation methodology, tasks, and protocols.

Include annotation guidelines, quality control procedures, task definitions, workforce type, annotation characteristics, and label distributions.

Data Annotation Platform — rai:dataAnnotationPlatform

Platform or tool used for annotation.

Examples: Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool. List joined to string.

Data Annotation Analysis — rai:dataAnnotationAnalysis

Analysis of annotation quality and inter-annotator agreement.

Cover metrics used (e.g. Cohen's kappa, Fleiss' kappa), systematic disagreements between annotators of different socio-demographic groups, and how final labels relate to individual annotator responses. List joined to string.

Personal/Sensitive Information — rai:personalSensitiveInformation

Does the dataset contain sensitive data?

Attribute types: Gender, Socio-economic status, Geography, Language, Age, Culture, Experience or Seniority, others. List joined to string.

Used in: AI-Ready Ethics scoring.

Data Social Impact — rai:dataSocialImpact

Describe potential social impacts and mitigation strategies.

Is there anything about the dataset's composition or collection that might impact future uses or create risks/harm (e.g. unfair treatment, legal or financial risks)?

Annotations Per Item — rai:annotationsPerItem

Number of annotations collected per data item.

Multiple annotations per item enable calculation of inter-annotator agreement.

Annotator Demographics — rai:annotatorDemographics

Demographic information about annotators, if available.

Examples: Geographic location, language background, expertise level, age group, gender. List joined to string.

Machine Annotation Tools — rai:machineAnnotationTools

Automated or ML-based annotation tools used.

Examples: NLP pipelines, computer vision models. Format each entry as ToolName version (e.g. spaCy 3.5.0). List joined to string.


Distribution Section

These properties populate the distribution/access panel.

Datasheet Field RO-Crate Property Notes
License license Same property as Overview
Publisher publisher Same property as Overview
DOI doi
Release Date datePublished Same property as Overview
Version version Same property as Overview

Summary Section (AI-Ready Score)

The summary section displays the computed AI-Ready score. See AI-Ready Scoring Reference for the full breakdown of which properties affect each scoring criterion.