CipherStash
CipherStash Documentation

Schema Definition

Your CipherStash collection needs a schema in order for it to be searchable.

The basics

A schema definition is a JSON file that contains an object with two top-level keys:

  1. "type", which lists the fields which are expected to appear in all your records, and the type of data those fields will contain. Types of data are things like “this is a string”, “this is an integer”, “this is a date”, and so on.
  2. "indexes", which describes the indexes, which are the ways in which you can search for the records in your collection.

The top-level structure of the schema JSON therefore looks like this:

{
  "type": { your type definition here },
  "indexes": { your indexes definition here }
}

Defining a Record Type

A type definition is a JSON object where the keys are field names and the values are names of types. Types can be names of scalar values (such as “string”, “float64”, “date”, “uint64”, “boolean”) or they can represent nested structured data.

The type definition describes the shape of the records that will be stored in your collection.

{
  "type": {
    "title": "string",
    "runningTime": "float64",
    "year": "float64"
  },
  // index definition omitted
}

Additional fields can also be stored in a record that are not included in the “type” definition. They will be persisted faithfully by CipherStash. However, you can only define indexes on fields that are included in the type definition.

Field Names

See Identifiers for Field Names and Index Names.

Supported Data Types for Fields

For a list of available types (and what index types they support), see Data Types.

In addition, compound types are supported using object syntax like this:

{
  "name": "string",
  "age": "uint64",
  "address": {
    "street": "string",
    "city": "string",
    "postcode": "string"
  }
}

Unsupported Field Types

Arrays

There is currently no way to specify a field with an array type. However, this only means that fields with an array type cannot be indexed. CipherStash will faithfully persist records containing arrays.

Defining Indexes

An index is what makes records in CipherStash searchable. In fact, it is impossible to search records in CipherStash without indexes. This is unlike traditional databases, where you can do a full table scan across unindexed fields.

CipherStash supports the following index types:

We define indexes in the schema JSON like so:

{
  "type": {
    "title": "string",
    "runningTime": "float64",
    "year": "float64"
  },
  "indexes": {
    "exactTitle": { "kind": "exact", "field": "title" }
  }
}

It is important to note that indexes can only refer to fields that are defined in the “type”.

In the example above, exactTitle is the name of the index. kind defines the type of the index (in this case it is an Exact index. field is the name of the field being indexed (in this case title). Different index kinds have different parameters to configure them. For example:

  • Indexes of the match, dynamic-match, and field-dynamic-match kinds must specify tokenFilters and tokenizers.
  • Indexes of the match kind must specify fields.
  • Indexes of the exact and range kinds only need to specify field.

Index Names

See Identifiers for Field Names and Index Names.

Defining Indexes on Compound Types

In order to refer to a nested field from within an index definition, we need to use dot notation just as if we were referencing a field of a nested object or struct in a programming language.

Given the following type definition:

{
  "name": "string",
  "age": "uint64",
  "address": {
    "street": "string",
    "city": "string",
    "postcode": "string"
  }
}

An index can be defined on “city” like so:

{
  "city": { "kind": "exact", "field": "address.city" },
}

Example of an Index for Every Index Kind

Here is a more comprehensive example that defines an index for every index type:

{
  "type": {
    "title": "string",
    "runningTime": "float64",
    "year": "float64"
  },
  "indexes": {
    "exactTitle": { "kind": "exact", "field": "title" },
    "runningTime": { "kind": "range", "field": "runningTime" },
    "year": { "kind": "range", "field": "year" },
    "title": {
      "kind": "match",
      "fields": ["title"],
      "tokenFilters": [
        { "kind": "downcase" },
        { "kind": "ngram", "tokenLength": 3 }
      ],
      "tokenizer": { "kind": "standard" }
    },
    "allTextDynamicMatch": {
      "kind": "dynamic-match",
      "tokenFilters": [
        { "kind": "downcase" }
      ],
      "tokenizer":  { "kind": "ngram", "tokenLength": 3 }
    },
    "allTextFieldDynamicMatch": {
      "kind": "field-dynamic-match",
      "tokenFilters": [
        { "kind": "downcase" }
      ],
      "tokenizer":  { "kind": "ngram", "tokenLength": 3 }
    }
  }
}

Options for “*match” Index Types

The “match”, “dynamic-match” and “field-dynamic-match” index kinds all operate on strings and all support the same token filtering and tokenizer options.

The tokenFilter kind can have one of the following values: upcase, downcase, ngram.

The tokenizer kind can be standard or ngram.

ngram supports the following option: tokenLength.

Identifiers for Field Names and Index Names

Identifier names and field names are validated using the following regular expression: /^[A-Z_][0-9A-Z_]*$/i.

To unpack this:

  • identifiers MUST be made up of alphabetical characters (both upper and lower case), numbers, and underscores.
  • identifiers MUST only start with alphabetical characters (both upper and lower case) and underscores.

Identifiers are case-sensitive. For example, the “age” and “Age” will be considered as different identifiers.

JSON Type Definition (JTD) for Indexes

This JTD only describes the “indexes” field of the schema and is provided to communicate its structure unambiguously.

The “type” field is not included because the JTD of “type” would have to be recursive in order to support nested types. JTD does not support recursion. CipherStash will publish a complete JSON Schema in the near future, but please contact us if you need something sooner.

{
  "definitions": {
    "indexDef": {
      "discriminator": "kind",
      "mapping": {
        "exact": {
          "properties": {
            "field": { "type": "string" }
          }
        },
        "range": {
          "properties": {
            "field": { "type": "string" }
          }
        },
        "match": {
          "properties": {
            "fields": { "elements": { "type": "string" }},
            "tokenFilters": {
              "elements": { "ref": "tokenFilter" }
            },
            "tokenizer": { "ref": "tokenizer" }
          }
        }
      }
    },
    "tokenFilter": {
      "discriminator": "kind",
      "mapping": {
        "downcase": { "properties": {} },
        "upcase": { "properties": {} },
        "ngram": {
          "optionalProperties": {
            "tokenLength": { "type": "uint8" }
          }
        }
      }
    },
    "tokenizer": {
      "discriminator": "kind",
      "mapping": {
        "standard": { "properties": {} }
      }
    }
  },

  "values": { "ref": "indexDef" }
}