Dataset Model¶
The Dataset
model is used to describe a data entity, which can be a single file or a logical grouping of files. It includes metadata about authorship, publication, version, and its relationships with other entities in the provenance graph.
Properties¶
Property | Type | Description | Required |
---|---|---|---|
guid (alias: @id ) |
str |
The unique, resolvable identifier for the Dataset. Should be an ARK. | Yes |
name |
str |
A human-readable name for the dataset. | Yes |
author |
Union[str, List[str]] |
The person, people, or organization that created the dataset. | Yes |
datePublished |
str |
The date the dataset was published, in ISO 8601 format. | Yes |
description |
str |
A detailed description of the dataset (min 10 characters). | Yes |
keywords |
List[str] |
A list of keywords to aid in discovery. | Yes |
format |
str |
The file format of the dataset (e.g., "CSV", "TSV", "image/jpeg"). Aliased from fileFormat . |
Yes |
metadataType |
Optional[str] |
The schema.org type. Defaults to https://w3id.org/EVI#Dataset . |
No |
additionalType |
Optional[str] |
An additional type identifier. Defaults to "Dataset". | No |
version |
str |
The version of the dataset. Defaults to "0.1.0". | No |
associatedPublication |
Optional[str] |
A URL or citation for a publication associated with this dataset. | No |
additionalDocumentation |
Optional[str] |
A URL for additional documentation. | No |
dataSchema |
Optional[IdentifierValue] |
A link (by @id ) to a Schema object that describes the structure of this dataset. |
No |
generatedBy |
Optional[Union[IdentifierValue, List[IdentifierValue]]] |
Links to the Computation or Experiment that produced this dataset. |
No |
derivedFrom |
Optional[List[IdentifierValue]] |
Links to one or more Dataset entities from which this dataset was derived. |
No |
usedByComputation |
Optional[List[IdentifierValue]] |
Links to Computation entities that used this dataset as an input. |
No |
contentUrl |
Optional[Union[str, List[str]]] |
The URL(s) or relative file path(s) pointing to the actual data file(s). | No |
Example¶
{
"@id": "ark:59852/dataset-control-1-report",
"@type": "https://w3id.org/EVI#Dataset",
"name": "Control Experiment 1: SEC-MS Processed Data (Report.tsv)",
"author": "Forget A, Obernier K, Krogan N",
"datePublished": "2025-06-23",
"version": "1.0",
"description": "Processed SEC-MS data (Report.tsv) for MDA-MB468 cells, control experiment 1.",
"keywords": [
"MDA-MB468",
"SEC-MS",
"proteomics",
"processed data",
"control"
],
"format": "TSV",
"evi:Schema": {
"@id": "ark:59852/schema-control-1-sec-ms-mda-mb468"
},
"generatedBy": [
{
"@id": "ark:59852/computation-control-1-sec-ms-mda-mb468"
}
],
"derivedFrom": [],
"usedByComputation": [],
"contentUrl": "ftp://massive-ftp.ucsd.edu/v10/MSV000098237/search/Biosep_MDAMB468_CTRL_1_Report.tsv"
}