Schema Definition
Your CipherStash collection needs a schema in order for it to be searchable.
The basics
A schema definition is a JSON file that contains an object with two top-level keys:
"type"
, which lists the fields which are expected to appear in all your records, and the type of data those fields will contain. Types of data are things like “this is a string”, “this is an integer”, “this is a date”, and so on."indexes"
, which describes the indexes, which are the ways in which you can search for the records in your collection.
The top-level structure of the schema JSON therefore looks like this:
{
"type": { your type definition here },
"indexes": { your indexes definition here }
}
Defining a Record Type
A type definition is a JSON object where the keys are field names and the values are names of types. Types can be names of scalar values (such as “string”, “float64”, “date”, “uint64”, “boolean”) or they can represent nested structured data.
The type definition describes the shape of the records that will be stored in your collection.
{
"type": {
"title": "string",
"runningTime": "float64",
"year": "float64"
},
// index definition omitted
}
Additional fields can also be stored in a record that are not included in the “type” definition. They will be persisted faithfully by CipherStash. However, you can only define indexes on fields that are included in the type definition.
Field Names
See Identifiers for Field Names and Index Names.
Supported Data Types for Fields
For a list of available types (and what index types they support), see Data Types.
In addition, compound types are supported using object syntax like this:
{
"name": "string",
"age": "uint64",
"address": {
"street": "string",
"city": "string",
"postcode": "string"
}
}
Unsupported Field Types
Arrays
There is currently no way to specify a field with an array type. However, this only means that fields with an array type cannot be indexed. CipherStash will faithfully persist records containing arrays.
Defining Indexes
An index is what makes records in CipherStash searchable. In fact, it is impossible to search records in CipherStash without indexes. This is unlike traditional databases, where you can do a full table scan across unindexed fields.
CipherStash supports the following index types:
We define indexes in the schema JSON like so:
{
"type": {
"title": "string",
"runningTime": "float64",
"year": "float64"
},
"indexes": {
"exactTitle": { "kind": "exact", "field": "title" }
}
}
It is important to note that indexes can only refer to fields that are defined in the “type”.
In the example above, exactTitle
is the name of the index.
kind
defines the type of the index (in this case it is an Exact index.
field
is the name of the field being indexed (in this case title
).
Different index kinds have different parameters to configure them.
For example:
- Indexes of the
match
,dynamic-match
, andfield-dynamic-match
kinds must specifytokenFilters
andtokenizers
. - Indexes of the
match
kind must specifyfields
. - Indexes of the
exact
andrange
kinds only need to specifyfield
.
Index Names
See Identifiers for Field Names and Index Names.
Defining Indexes on Compound Types
In order to refer to a nested field from within an index definition, we need to use dot notation just as if we were referencing a field of a nested object or struct in a programming language.
Given the following type definition:
{
"name": "string",
"age": "uint64",
"address": {
"street": "string",
"city": "string",
"postcode": "string"
}
}
An index can be defined on “city” like so:
{
"city": { "kind": "exact", "field": "address.city" },
}
Example of an Index for Every Index Kind
Here is a more comprehensive example that defines an index for every index type:
{
"type": {
"title": "string",
"runningTime": "float64",
"year": "float64"
},
"indexes": {
"exactTitle": { "kind": "exact", "field": "title" },
"runningTime": { "kind": "range", "field": "runningTime" },
"year": { "kind": "range", "field": "year" },
"title": {
"kind": "match",
"fields": ["title"],
"tokenFilters": [
{ "kind": "downcase" },
{ "kind": "ngram", "tokenLength": 3 }
],
"tokenizer": { "kind": "standard" }
},
"allTextDynamicMatch": {
"kind": "dynamic-match",
"tokenFilters": [
{ "kind": "downcase" }
],
"tokenizer": { "kind": "ngram", "tokenLength": 3 }
},
"allTextFieldDynamicMatch": {
"kind": "field-dynamic-match",
"tokenFilters": [
{ "kind": "downcase" }
],
"tokenizer": { "kind": "ngram", "tokenLength": 3 }
}
}
}
Options for “*match” Index Types
The “match”, “dynamic-match” and “field-dynamic-match” index kinds all operate on strings and all support the same token filtering and tokenizer options.
The tokenFilter
kind
can have one of the following values: upcase
, downcase
, ngram
.
The tokenizer
kind
can be standard
or ngram
.
ngram
supports the following option: tokenLength
.
Identifiers for Field Names and Index Names
Identifier names and field names are validated using the following regular expression: /^[A-Z_][0-9A-Z_]*$/i
.
To unpack this:
- identifiers MUST be made up of alphabetical characters (both upper and lower case), numbers, and underscores.
- identifiers MUST only start with alphabetical characters (both upper and lower case) and underscores.
Identifiers are case-sensitive. For example, the “age” and “Age” will be considered as different identifiers.
JSON Type Definition (JTD) for Indexes
This JTD only describes the “indexes” field of the schema and is provided to communicate its structure unambiguously.
The “type” field is not included because the JTD of “type” would have to be recursive in order to support nested types. JTD does not support recursion. CipherStash will publish a complete JSON Schema in the near future, but please contact us if you need something sooner.
{
"definitions": {
"indexDef": {
"discriminator": "kind",
"mapping": {
"exact": {
"properties": {
"field": { "type": "string" }
}
},
"range": {
"properties": {
"field": { "type": "string" }
}
},
"match": {
"properties": {
"fields": { "elements": { "type": "string" }},
"tokenFilters": {
"elements": { "ref": "tokenFilter" }
},
"tokenizer": { "ref": "tokenizer" }
}
}
}
},
"tokenFilter": {
"discriminator": "kind",
"mapping": {
"downcase": { "properties": {} },
"upcase": { "properties": {} },
"ngram": {
"optionalProperties": {
"tokenLength": { "type": "uint8" }
}
}
}
},
"tokenizer": {
"discriminator": "kind",
"mapping": {
"standard": { "properties": {} }
}
}
},
"values": { "ref": "indexDef" }
}