Validation Commands¶
This document provides detailed information about the validation commands available in fairscape-cli.
Overview¶
The validate
command group provides operations for validating data against schemas. This ensures that datasets conform to their expected structure and constraints.
fairscape-cli validate [COMMAND] [OPTIONS]
Available Commands¶
schema
- Validate a dataset against a schema definition
Command Details¶
schema
¶
Validate a dataset against a schema definition.
fairscape-cli validate schema [OPTIONS]
Options:
--schema TEXT
- Path to the schema file or ARK identifier [required]--data TEXT
- Path to the data file to validate [required]
Example:
fairscape-cli validate schema \
--schema ./schema_apms_music_embedding.json \
--data ./APMS_embedding_MUSIC.csv
When validation succeeds, you'll see:
Validation Success
If validation fails, you'll see a table of errors:
+-----+-----------------+----------------+-------------------------------------------------------+
| row | error_type | failed_keyword | message |
+-----+-----------------+----------------+-------------------------------------------------------+
| 3 | ParsingError | None | ValueError: Failed to Parse Attribute embed for Row 3 |
| 4 | ParsingError | None | ValueError: Failed to Parse Attribute embed for Row 4 |
| 0 | ValidationError | pattern | 'APMS_A' does not match '^APMS_[0-9]*$' |
+-----+-----------------+----------------+-------------------------------------------------------+
Error Types¶
Errors are categorized into two main types:
-
ParsingError: Occurs when the data cannot be parsed according to the schema structure. This often happens when:
-
The number of columns doesn't match the schema
-
A value cannot be converted to the expected datatype
-
ValidationError: Occurs when the data can be parsed but fails validation constraints like:
- String values not matching the specified pattern
- Numeric values outside the min/max range
- Array length not within specified bounds
Working with Different File Types¶
The validation command automatically detects the file type based on its extension:
- CSV/TSV files: Tabular validation with field separators
- Parquet files: Tabular validation with columnar storage
- HDF5 files: Hierarchical validation with nested structures
Using ARK Identifiers for Schemas¶
Instead of providing a file path, you can reference a schema by its ARK identifier if it's registered in a FAIRSCAPE repository:
fairscape-cli validate schema \
--schema "ark:59852/schema-cm4ai-image-embedding-image-emd" \
--data "examples/schemas/cm4ai-rocrates/image_embedding/image_emd.tsv"