Schema Model

The Schema model provides a formal, machine-readable definition for a tabular dataset, based on JSON Schema principles. It is used to validate the structure and content of data files and to provide semantic meaning to their columns.

It inherits from FairscapeEVIBaseModel, which includes properties like keywords, license, and context.

Properties

Property Type Description Required
guid (alias: @id) str The unique, resolvable identifier for the Schema. Yes
name str A human-readable name for the schema. Yes
description str A detailed description of the data that this schema defines (min 5 characters). Yes
properties Dict[str, Property] A dictionary where keys are property names and values are Property objects describing each column or column group. Yes
metadataType (alias: @type) str The schema type. Defaults to evi:Schema. No
schemaType (alias: type) Optional[str] The JSON Schema type for the root object. Defaults to "object". No
additionalProperties Optional[bool] Whether additional columns not defined in properties are allowed. Defaults to True. No
required Optional[List[str]] A list of property names that must be present in the data. No
separator Optional[str] The character used to separate columns in the data file (e.g., "," or "\t"). Defaults to ",". No
header Optional[bool] Whether the data file contains a header row. Defaults to True. No

The Property Object

Each key in the properties dictionary maps to a Property object with the following fields:

Property Type Description Required
description str A human-readable description of the column/property. Yes
index Union[str, int] The 0-based index or slice (e.g., 2::, ::5, 2:5) of the column(s). Yes
type str The data type for the column (string, number, integer, array, boolean). Yes
value_url Optional[str] A URL to an ontology term that formally defines the property's meaning. No
pattern Optional[str] For string types, a valid regular expression that the column's values must match. No
items Optional[Item] For array types, an Item object describing the elements within the array. It must contain a type field. No
min_items Optional[int] For array types, the minimum number of items. No
max_items Optional[int] For array types, the maximum number of items. No
unique_items Optional[bool] For array types, whether all items must be unique. No

Example

{
  "@context": {
    "@vocab": "https://schema.org/",
    "EVI": "https://w3id.org/EVI#"
  },
  "@id": "ark:59852/schema-apms-music-embedding-izNjXSs",
  "@type": "EVI:Schema",
  "name": "APMS Embedding Schema",
  "description": "Tabular format for APMS music embeddings from PPI networks from the music pipeline from the B2AI Cellmaps for AI project",
  "properties": {
    "Experiment Identifier": {
      "description": "Identifier for the APMS experiment responsible for generating the raw PPI used to create this embedding vector",
      "index": 0,
      "type": "string",
      "pattern": "^APMS_[0-9]*$"
    },
    "Gene Symbol": {
      "description": "Gene Symbol for the APMS bait protien",
      "index": 1,
      "type": "string",
      "pattern": "^[A-Za-z0-9\\\\-]*$"
    },
    "MUSIC APMS Embedding": {
      "description": "Embedding Vector values for genes determined by running node2vec on APMS PPI networks. Vector has 1024 values for each bait protien",
      "index": "2::",
      "type": "array",
      "maxItems": 1024,
      "minItems": 1024,
      "uniqueItems": false,
      "items": { "type": "number" }
    }
  },
  "type": "object",
  "required": ["Experiment Identifier", "Gene Symbol", "MUSIC APMS Embedding"],
  "separator": ",",
  "header": false
}