Datasheet Property Reference¶
The fairscape-cli build datasheet command generates an HTML datasheet from your RO-Crate metadata. This document lists every RO-Crate property that populates the datasheet, organized by section.
All properties are read from the root entity of the RO-Crate (the ROCrateMetadataElem).
Overview Section¶
These properties populate the main overview panel of the datasheet.
| Datasheet Field | RO-Crate Property |
|---|---|
| Title | name |
| Description | description |
| Identifier | @id |
| DOI | identifier |
| License | license |
| Ethical Review | ethicalReview |
| Release Date | datePublished |
| Created Date | dateCreated |
| Updated Date | dateModified |
| Authors | author |
| Publisher | publisher |
| Principal Investigator | principalInvestigator |
| Contact Email | contactEmail |
| Copyright | copyrightNotice |
| Terms of Use | conditionsOfAccess |
| Confidentiality Level | confidentialityLevel |
| Citation | citation |
| Version | version |
| Content Size | contentSize |
| Funding | funder |
| Keywords | keywords |
| Related Publications | associatedPublication |
| Completeness | completeness |
| Human Subject | humanSubjects |
| Human Subject Research | humanSubjectResearch |
| Human Subject Exemptions | humanSubjectExemption |
| De-identified Samples | deidentified |
| FDA Regulated | fdaRegulated |
| IRB | irb |
| IRB Protocol ID | irbProtocolId |
| Data Governance | dataGovernanceCommittee |
Property Details¶
Title — name
Human-readable name for the dataset.
Description — description
Human-readable description of the dataset.
Identifier — @id
Persistent unique identifier (ARK, URL, etc.).
Used in: AI-Ready Findability and Sustainability scoring.
DOI — identifier
DOI or other external persistent identifier.
Used in: AI-Ready Findability and Sustainability scoring.
License — license
Link to or name of the dataset license.
Example: https://creativecommons.org/licenses/by/4.0/
Used in: Distribution section, AI-Ready Fairness and Ethics scoring.
Ethical Review — ethicalReview
Were any ethical or compliance review processes conducted (e.g. by an IRB)?
Describe the process, frequency of review, and outcomes.
Used in: AI-Ready Ethics scoring.
Release Date — datePublished
Date the dataset was published or made publicly available (ISO 8601).
Used in: Distribution section.
Created Date — dateCreated
Date the dataset was originally created (ISO 8601).
Updated Date — dateModified
Date the dataset was last modified (ISO 8601).
Authors — author
Who created the dataset — e.g. which team, research group, on behalf of which institution.
Format: Parsed as list (comma or semicolon separated).
Publisher — publisher
Organization or person responsible for publishing or distributing the dataset.
Used in: Distribution section, AI-Ready Computability scoring.
Principal Investigator — principalInvestigator
A key individual (PI) responsible for or overseeing dataset creation.
Used in: AI-Ready Provenance scoring.
Contact Email — contactEmail
Email address for questions or correspondence about the dataset.
Copyright — copyrightNotice
Copyright statement including year and rights holder.
Example: Copyright (c) 2024 by The Regents of the University of California
Terms of Use — conditionsOfAccess
Terms and conditions governing access to and use of this dataset.
Includes any data use agreements required.
Confidentiality Level — confidentialityLevel
Access restriction classification.
Values: unrestricted, restricted, or confidential.
Used in: AI-Ready Ethics scoring.
Citation — citation
Preferred citation string for this dataset.
Version — version
Version string for this release (e.g. 1.0, 2.3.1).
Used in: Distribution section.
Content Size — contentSize
Total size of the dataset content (e.g. 2.4 GB).
Used in: AI-Ready Characterization scoring.
Funding — funder
Who funded the creation of the dataset? Include grant names and numbers.
Format: Parsed as list.
Keywords — keywords
Keywords or tags describing the dataset, used for discovery and search.
Related Publications — associatedPublication
Publication(s) associated with or describing this dataset.
Format: Parsed as list.
Completeness — completeness
Assessment of how complete the dataset is relative to its intended scope — e.g. percentage of expected records present, known gaps.
Fallback: additionalProperty with name "Completeness".
Human Subject — humanSubjects
Does this dataset involve human subjects? Indicate Yes/No and describe the nature of involvement.
Fallback: additionalProperty with name "Human Subject".
Used in: AI-Ready Ethics scoring.
Human Subject Research — humanSubjectResearch
Broader context for human subjects research involvement.
Covers regulatory frameworks followed (e.g. 45 CFR 46, HIPAA).
Fallback: additionalProperty with name "Human Subject Research".
Human Subject Exemptions — humanSubjectExemption
Applicable exemption category if human subjects research is exempt from full IRB review.
Example: 45 CFR 46 Exemption 4.
Fallback: additionalProperty with name "Human Subjects Exemptions".
De-identified Samples — deidentified
Whether the dataset has been de-identified to remove or obscure PII.
Boolean converted to Yes/No.
Fallback: additionalProperty with name "De-identified Samples".
FDA Regulated — fdaRegulated
Whether this dataset is subject to FDA regulations — e.g. clinical trial data, medical device data.
Boolean converted to Yes/No.
Fallback: additionalProperty with name "FDA Regulated".
IRB — irb
Institutional Review Board (IRB) information — covers approval status, approving institution, and contact details.
Accepts a plain string or a structured IRB object.
Fallback: additionalProperty with name "IRB".
IRB Protocol ID — irbProtocolId
IRB protocol identifier number assigned by the reviewing institution.
Fallback: additionalProperty with name "IRB Protocol ID".
Data Governance — dataGovernanceCommittee
Name or contact for the data governance committee — responsible for oversight, access control, and policy enforcement.
Fallback: additionalProperty with name "Data Governance Committee".
Used in: AI-Ready Ethics and Sustainability scoring.
additionalProperty Fallbacks¶
Several human-subjects and governance fields support a fallback mechanism. If the top-level property is not set, the converter checks the additionalProperty array for a matching name entry:
{
"additionalProperty": [
{"name": "Human Subject", "value": "Yes"},
{"name": "Data Governance Committee", "value": "Bridge2AI Ethics Committee"}
]
}
Use Cases Section¶
These properties populate the responsible AI (RAI) and data documentation panel. RAI properties conform to the Croissant RAI 1.0 specification. A dataset can declare conformance by setting dct:conformsTo: "http://mlcommons.org/croissant/RAI/1.0" at the root entity level.
| Datasheet Field | RO-Crate Property | Croissant RAI Use Case |
|---|---|---|
| Intended Use | rai:dataUseCases |
AI safety and fairness, Compliance |
| Limitations | rai:dataLimitations |
AI safety and fairness |
| Prohibited Uses | prohibitedUses |
Compliance |
| Potential Sources of Bias | rai:dataBiases |
AI safety and fairness |
| Maintenance Plan | rai:dataReleaseMaintenancePlan |
Compliance, Data life cycle |
| Data Collection | rai:dataCollection |
Data life cycle |
| Data Collection Type | rai:dataCollectionType |
Data life cycle |
| Data Collection Missing Data | rai:dataCollectionMissingData |
Data life cycle |
| Data Collection Raw Data | rai:dataCollectionRawData |
Data life cycle |
| Data Collection Timeframe | rai:dataCollectionTimeframe |
Data life cycle |
| Data Imputation Protocol | rai:dataImputationProtocol |
Compliance |
| Data Manipulation Protocol | rai:dataManipulationProtocol |
Compliance |
| Data Preprocessing Protocol | rai:dataPreprocessingProtocol |
Data life cycle |
| Data Annotation Protocol | rai:dataAnnotationProtocol |
Data labeling, Compliance |
| Data Annotation Platform | rai:dataAnnotationPlatform |
Data labeling, Participatory data |
| Data Annotation Analysis | rai:dataAnnotationAnalysis |
Data labeling, Compliance |
| Personal/Sensitive Information | rai:personalSensitiveInformation |
Compliance, AI safety and fairness |
| Data Social Impact | rai:dataSocialImpact |
AI safety and fairness |
| Annotations Per Item | rai:annotationsPerItem |
Data labeling |
| Annotator Demographics | rai:annotatorDemographics |
Data labeling, Participatory data, Inclusion |
| Machine Annotation Tools | rai:machineAnnotationTools |
Data labeling |
Property Details¶
Intended Use — rai:dataUseCases
Explicit statement of intended uses.
Values: Training, Testing, Validation, Development or Production Use, Fine Tuning, others.
Include usage guidelines and caveats.
Used in: AI-Ready Pre-Model Explainability scoring.
Limitations — rai:dataLimitations
Known limitations of the dataset.
Covers data generalization limits (e.g. related to data distribution or quality) and non-recommended uses. Distinct from biases (systematic errors) and anomalies (data quality issues).
Used in: AI-Ready Pre-Model Explainability scoring.
Prohibited Uses — prohibitedUses
Explicit statement of uses not permitted by license, ethics, or policy. Stronger than discouraged uses.
Fallback: additionalProperty with name "Prohibited Uses".
Used in: AI-Ready Ethics scoring.
Potential Sources of Bias — rai:dataBiases
Known biases in the dataset — systematic errors or prejudices that may affect representativeness or fairness.
Distinct from anomalies (data quality issues) and limitations (scope constraints).
Used in: AI-Ready Characterization scoring.
Maintenance Plan — rai:dataReleaseMaintenancePlan
Will the dataset be updated? If so, how often and by whom?
Cover versioning timeframe, maintainers, how updates are communicated, and deprecation policies.
Used in: AI-Ready Sustainability scoring.
Data Collection — rai:dataCollection
What mechanisms or procedures were used to collect the data?
Examples: Hardware sensors, manual curation, software APIs. Also covers how these mechanisms were validated.
Used in: AI-Ready Ethics scoring.
Data Collection Type — rai:dataCollectionType
List of collection type(s), joined to string.
Recommended values: Surveys, Secondary Data Analysis, Physical Data Collection, Direct Measurement, Document Analysis, Manual Human Curator, Software Collection, Experiments, Web Scraping, Web API, Focus Groups, Self-Reporting, Customer Feedback Data, User-Generated Content Data, Passive Data Collection, Others.
Data Collection Missing Data — rai:dataCollectionMissingData
Document missing data patterns and handling strategies.
Cover pattern types (MCAR, MAR, MNAR), known or suspected causes (e.g. sensor failures, participant dropout, privacy constraints), and strategies used to handle missing values.
Used in: AI-Ready Characterization scoring.
Data Collection Raw Data — rai:dataCollectionRawData
Description of raw data sources before preprocessing or labeling.
Document where the original data comes from and how it can be accessed.
Data Collection Timeframe — rai:dataCollectionTimeframe
Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the underlying data?
Provide start and end dates where possible. List joined to string.
Data Imputation Protocol — rai:dataImputationProtocol
Describe data imputation methodology.
Cover techniques used (e.g. mean/median imputation, forward fill, model-based imputation) and rationale for chosen approaches.
Data Manipulation Protocol — rai:dataManipulationProtocol
Was any cleaning done? Describe the procedures applied.
Examples: Removal of instances, processing of missing values, deduplication, filtering.
Data Preprocessing Protocol — rai:dataPreprocessingProtocol
Was any preprocessing done? Describe steps to make the data ML-ready.
Examples: Discretization or bucketing, tokenization, feature extraction, normalization. List joined to string.
Data Annotation Protocol — rai:dataAnnotationProtocol
Annotation methodology, tasks, and protocols.
Include annotation guidelines, quality control procedures, task definitions, workforce type, annotation characteristics, and label distributions.
Data Annotation Platform — rai:dataAnnotationPlatform
Platform or tool used for annotation.
Examples: Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool. List joined to string.
Data Annotation Analysis — rai:dataAnnotationAnalysis
Analysis of annotation quality and inter-annotator agreement.
Cover metrics used (e.g. Cohen's kappa, Fleiss' kappa), systematic disagreements between annotators of different socio-demographic groups, and how final labels relate to individual annotator responses. List joined to string.
Personal/Sensitive Information — rai:personalSensitiveInformation
Does the dataset contain sensitive data?
Attribute types: Gender, Socio-economic status, Geography, Language, Age, Culture, Experience or Seniority, others. List joined to string.
Used in: AI-Ready Ethics scoring.
Data Social Impact — rai:dataSocialImpact
Describe potential social impacts and mitigation strategies.
Is there anything about the dataset's composition or collection that might impact future uses or create risks/harm (e.g. unfair treatment, legal or financial risks)?
Annotations Per Item — rai:annotationsPerItem
Number of annotations collected per data item.
Multiple annotations per item enable calculation of inter-annotator agreement.
Annotator Demographics — rai:annotatorDemographics
Demographic information about annotators, if available.
Examples: Geographic location, language background, expertise level, age group, gender. List joined to string.
Machine Annotation Tools — rai:machineAnnotationTools
Automated or ML-based annotation tools used.
Examples: NLP pipelines, computer vision models. Format each entry as ToolName version (e.g. spaCy 3.5.0). List joined to string.
Distribution Section¶
These properties populate the distribution/access panel.
| Datasheet Field | RO-Crate Property | Notes |
|---|---|---|
| License | license |
Same property as Overview |
| Publisher | publisher |
Same property as Overview |
| DOI | doi |
|
| Release Date | datePublished |
Same property as Overview |
| Version | version |
Same property as Overview |
Summary Section (AI-Ready Score)¶
The summary section displays the computed AI-Ready score. See AI-Ready Scoring Reference for the full breakdown of which properties affect each scoring criterion.