RO-Crate


To perform any RO-Crate operation, simply use the rocrate sub-command within the fairscape-cli root command.


Create RO-Crate

To create an RO-Crate, you have the option to use either the create or init sub-commands. With create, you can specify the destination directory using the ROCRATE_PATH argument, whereas init creates the RO-Crate in the current working directory. Both sub-commands require five parameters: name, description, keywords, organization-name, and project-name, as well as an optional guid parameter. To view all available options and arguments, simply enter the command fairscape-cli rocrate create --help to display a comprehensive list.

Usage: fairscape-cli rocrate create [OPTIONS] ROCRATE_PATH

  Create an ROCrate in a new path specified by the rocrate-path argument

Options:
  --guid TEXT
  --name TEXT               [required]
  --organization-name TEXT  [required]
  --project-name TEXT       [required]
  --description TEXT        [required]
  --keywords TEXT           [required]
  --help                    Show this message and exit.

To create an RO-Crate with minimal metadata, use the following command. This will generate a unique identifier and create a ro-crate-metadata.json file at the specified ROCRATE_PATH location.

fairscape-cli rocrate create \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"

Alternatively, use the fairscape-cli rocrate init command to create the same RO-Crate in the current working directory.

fairscape-cli rocrate init \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS"

Add object and metadata

In the FAIRSCAPE ecosystem, datasets and software are treated as objects that can be added to an RO-Crate using the add sub-command. This command fetches the object and transfers it to the crate. Enter the command fairscape-cli rocrate add --help to display the list of objects to add.

Usage: fairscape-cli rocrate add [OPTIONS] COMMAND [ARGS]...

  Add (transfer) object to RO-Crate and register object metadata.

Options:
  --help  Show this message and exit.

Commands:
  dataset   Add a Dataset file and its metadata to the RO-Crate.
  software  Add a Software and its corresponding metadata.

Dataset object

The sub-command below, labeled as add dataset, utilizes necessary options to add a dataset object to the crate and populate corresponding metadata in the ro-crate-metadata.json file. An identifier is generated to uniquely represent the dataset. It requires eight parameters including name, author, version, date-published, description, data-format, source-filepath, and destination-filepath. Additional parameters are optional. The dataset metadata is then added to the ro-crate-metadata.json, and the dataset object is transferred to the specified location in ROCRATE_PATH. Enter fairscape-cli rocrate add dataset --help to show its use:

Usage: fairscape-cli rocrate add dataset [OPTIONS] ROCRATE_PATH

  Add a Dataset file and its metadata to the RO-Crate.

Options:
  --guid TEXT
  --name TEXT                     [required]
  --url TEXT
  --author TEXT                   [required]
  --version TEXT                  [required]
  --date-published TEXT           [required]
  --description TEXT              [required]
  --keywords TEXT                 [required]
  --data-format TEXT              [required]
  --source-filepath TEXT          [required]
  --destination-filepath TEXT     [required]
  --used-by TEXT
  --derived-from TEXT
  --schema TEXT
  --associated-publication TEXT
  --additional-documentation TEXT
  --help                          Show this message and exit.

The example below utilizes necessary options to add a dataset object to the crate and populate corresponding metadata in the ro-crate-metadata.json file.

fairscape-cli rocrate add dataset \
  --name "AP-MS embeddings" \
  --author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
  --version "1.0" \
  --date-published "2021-04-23" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --data-format "CSV" \
  --source-filepath "./tests/data/APMS_embedding_MUSIC.csv" \
  --destination-filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
  "./test_rocrate"

The example below performs the same operation utilizing both required and optional parameters:

fairscape-cli rocrate add dataset \
  --guid "ark:5982/UVA/B2AI/example_rocrate/AP-MS_embeddings-Dataset" \
  --name "AP-MS embeddings" \
  --url "https://github.com/idekerlab/MuSIC/blob/master/Examples/APMS_embedding.MuSIC.csv" \
  --author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
  --version "1.0" \
  --date-published "2021-04-23" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --data-format "CSV" \
  --source-filepath "./tests/data/APMS_embedding_MUSIC.csv" \
  --destination-filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
  --used-by "create labeled training & test sets  random_forest_samples.py" \
  --derived-from "node2vec predict" \
  --associated-publication "Qin, Y. et al. A multi-scale map of cell structure fusing protein images and interactions" \
  --additional-documentation "https://idekerlab.ucsd.edu/music/" \
  "./test_rocrate"

One of the features offered by fairscape-cli is the ability to annotate certain types of dataset objects with schema-level metadata. The examples in Schema Metadata demonstrate how to describe the schema of a dataset object as metadata. This feature includes a mechanism to validate the metadata against the object.


Software object

To add a software object, use the software sub-command, which requires eight parameters, namely name, author, version, description, file-format, source-filepath, destination-filepath, and date-modified. Five additional parameters are optional. Metadata about the software is added to the ro-crate-metadata.json file, and the software object is sent to the location specified by ROCRATE_PATH. Enter fairscape-cli rocrate add software --help to show its use:

Usage: fairscape-cli rocrate add software [OPTIONS] ROCRATE_PATH

  Add a Software and its corresponding metadata.

Options:
  --guid TEXT
  --name TEXT                     [required]
  --author TEXT                   [required]
  --version TEXT                  [required]
  --description TEXT              [required]
  --keywords TEXT                 [required]
  --file-format TEXT              [required]
  --url TEXT
  --source-filepath TEXT          [required]
  --destination-filepath TEXT     [required]
  --date-modified TEXT            [required]
  --used-by-computation TEXT
  --associated-publication TEXT
  --additional-documentation TEXT
  --help                          Show this message and exit.

The example below uses the required options to add a software object to the crate and populate the associated metadata within the metadata file ro-crate-metadata.json. An automatic identifier is generated to uniquely represent the software.

fairscape-cli rocrate add software \
  --name "calibrate pairwise distance" \
  --author "Qin, Y." \
  --version "1.0" \
  --description "script written in python to calibrate pairwise distance." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --file-format "py" \
  --source-filepath "./tests/data/calibrate_pairwise_distance.py" \
  --destination-filepath "./test_rocrate/calibrate_pairwise_distance.py" \
  --date-modified "2021-04-23" \
  "./test_rocrate"

The same operation can be performed using both required and optional parameters with the following command.

fairscape-cli rocrate add software \
  --guid "ark:5982/UVA/B2AI/example_rocrate/calibrate_pairwise_distance-Software" \
  --name "calibrate pairwise distance" \
  --author "Qin, Y." \
  --version "1.0" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "U2OS" \
  --file-format "py" \
  --url "https://github.com/idekerlab/MuSIC/blob/master/calibrate_pairwise_distance.py" \
  --source-filepath "./tests/data/calibrate_pairwise_distance.py" \
  --destination-filepath "./test_rocrate/calibrate_pairwise_distance.py" \
  --date-modified "2021-06-20" \
  --used-by-computation "ARK:compute_standard_proximities.1/f9aa5f3f-665a-4ab9-8879-8d0d52f05265" \
  --associated-publication "Qin, Y. et al. A multi-scale map of cell structure fusing protein images and interactions. Nature 600, 536–542 2021" \
  --additional-documentation "https://idekerlab.ucsd.edu/music/" \
  "./test_rocrate"

Register metadata

Registering metadata adds the metadata of an object (dataset, object) or an activity (computation) to the ro-crate-metadata.json. Before the execution of the register sub-command, objects are required to be present in the path specified by the --filepath option, hence, no transfer of objects takes place during the execution. There is no similar requirement to specify a path for registering a computation as an activity.

Enter fairscape-cli rocrate register --help to show its use:

Usage: fairscape-cli rocrate register [OPTIONS] COMMAND [ARGS]...

  Add a metadata record to the RO-Crate for a Dataset, Software, or
  Computation

Options:
  --help  Show this message and exit.

Commands:
  computation  Register a Computation with the specified RO-Crate
  dataset      Register Dataset object metadata with the specified RO-Crate
  software     Register a Software metadata record to the specified ROCrate

Computation metadata

To register a computation, use the register computation sub-command. In the FAIRSCAPE ecosystem, computation is considered an activity, unlike datasets and software that are treated as objects. This sub-command requires five mandatory parameters: name, run-by, date-created, description, and keywords, as well as five optional parameters. Once executed, metadata about the computation is added to ro-crate-metadata.json in the ROCRATE_PATH location.

To view all available options and arguments for registering a computation, enter fairscape-cli rocrate register computation --help:

Usage: fairscape-cli rocrate register computation [OPTIONS] ROCRATE_PATH

  Register a Computation with the specified RO-Crate

Options:
  --guid TEXT
  --name TEXT           [required]
  --run-by TEXT         [required]
  --command TEXT
  --date-created TEXT   [required]
  --description TEXT    [required]
  --keywords TEXT       [required]
  --used-software TEXT
  --used-dataset TEXT
  --generated TEXT
  --help                Show this message and exit.

The register computation sub-command can also be used to populate the metadata of a computation within ro-crate-metadata.json using only the necessary options. Additionally, a unique identifier is generated automatically to represent the computation.

fairscape-cli rocrate register computation \
  --name "calibrate pairwise distance" \
  --run-by "Qin, Y." \
  --date-created "2021-05-23" \
  --description "Average the predicted proximities" \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"

The same operation can be performed using both required and optional parameters with the following command.

fairscape-cli rocrate register computation \
  --guid "ark:5982/UVA/B2AI/test_rocrate/calibrate_pairwise_distance-Computation" \
  --name "calibrate pairwise distance" \
  --run-by "Qin, Y." \
  --command "some command" \
  --date-created "2021-05-23" \
  --description "Average the predicted proximities" \
  --keywords "b2ai" \
  --keywords "clustering" \
  --used-software "random_forest_output (https://github.com/idekerlab/MuSIC/blob/master/random_forest_output.py)" \
  --used-dataset "IF_emd_1_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_1.pkl" \
  --used-dataset "IF_emd_2_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_1.pkl" \
  --used-dataset "IF_emd_1_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_2.pkl" \
  --used-dataset "IF_emd_2_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_2.pkl" \
  --used-dataset """Fold 1 proximities: IF_emd_1_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_3.pkl""" \
  --used-dataset "IF_emd_2_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_3.pkl" \
  --used-dataset """Fold 1 proximities: IF_emd_1_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_4.pkl""" \
  --used-dataset "IF_emd_2_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_4.pkl" \
  --used-dataset """Fold 1 proximities: IF_emd_1_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_5.pkl""" \
  --used-dataset "IF_emd_2_APMS_emd_1.RF_maxDep_30_nEst_1000.fold_5.pkl" \
  --generated "averages of predicted protein proximities (https://github.com/idekerlab/MuSIC/blob/master/Examples/MuSIC_predicted_proximity.txt)" \
  "./test_rocrate"

Dataset metadata

To register a dataset, use the register dataset sub-command and include the filepath option to specify the source file path. This command adds metadata about the dataset to ro-crate-metadata.json in the ROCRATE_PATH directory.

To view all available options and arguments for registering a dataset, enter fairscape-cli rocrate register dataset --help:

Usage: fairscape-cli rocrate register dataset [OPTIONS] ROCRATE_PATH

  Register Dataset object metadata with the specified RO-Crate

Options:
  --guid TEXT
  --name TEXT                     [required]
  --url TEXT
  --author TEXT                   [required]
  --version TEXT                  [required]
  --date-published TEXT           [required]
  --description TEXT              [required]
  --keywords TEXT                 [required]
  --data-format TEXT              [required]
  --filepath TEXT                 [required]
  --used-by TEXT
  --derived-from TEXT
  --schema TEXT
  --associated-publication TEXT
  --additional-documentation TEXT
  --help                          Show this message and exit.

Execute the following command to use all available options and argument for registering a dataset:

fairscape-cli rocrate register dataset \
  --guid "ark:5982/UVA/B2AI/example_rocrate/AP-MS_embeddings-Dataset" \
  --name "AP-MS embeddings" \
  --url "https://github.com/idekerlab/MuSIC/blob/master/Examples/APMS_embedding.MuSIC.csv" \
  --author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
  --version "1.0" \
  --date-published "2021-04-23" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "apms" \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --data-format "CSV" \
  --filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
  --used-by "create labeled training & test sets  random_forest_samples.py" \
  --derived-from "node2vec predict" \
  --associated-publication "Qin, Y. et al. A multi-scale map of cell structure fusing protein images and interactions" \
  --additional-documentation "https://idekerlab.ucsd.edu/music/" \
  "./test_rocrate"

Software metadata

Furthermore, to register software, you can make use of the register software sub-command. This sub-command necessitates the inclusion of the filepath option, which specifies the source file path. Upon execution, this command will append metadata about the software to the ro-crate-metadata.json file in the ROCRATE_PATH directory.

To view all available options and arguments for registering a software, enter fairscape-cli rocrate register software --help:

Usage: fairscape-cli rocrate register software [OPTIONS] ROCRATE_PATH

  Register a Software metadata record to the specified ROCrate

Options:
  --guid TEXT
  --name TEXT                     [required]
  --author TEXT                   [required]
  --version TEXT                  [required]
  --description TEXT              [required]
  --keywords TEXT                 [required]
  --file-format TEXT              [required]
  --url TEXT
  --date-modified TEXT
  --filepath TEXT
  --used-by-computation TEXT
  --associated-publication TEXT
  --additional-documentation TEXT
  --help                          Show this message and exit.

Execute the following command to use all available options and argument for registering a software:

fairscape-cli rocrate register software \
  --guid "ark:5982/UVA/B2AI/example_rocrate/calibrate_pairwise_distance-Software" \
  --name "calibrate pairwise distance" \
  --author "Qin, Y." \
  --version "1.0" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "U20S" \
  --file-format "py" \
  --url "https://github.com/idekerlab/MuSIC/blob/master/calibrate_pairwise_distance.py" \
  --filepath "./test_rocrate/calibrate_pairwise_distance.py" \
  --date-modified "2021-06-20" \
  --used-by-computation "ARK:compute_standard_proximities.1/f9aa5f3f-665a-4ab9-8879-8d0d52f05265" \
  --associated-publication "Qin, Y. et al. A multi-scale map of cell structure fusing protein images and interactions. Nature 600, 536–542 2021" \
  --additional-documentation "https://idekerlab.ucsd.edu/music/" \
  "./test_rocrate"