CipherStash
CipherStash Documentation

Create Your First Collection

All sources in CipherStash are stored in a collection. You can think of a collection as loosely analogous to a table in a relational database, and sources like rows.

For this tutorial, we’re going to create a collection that stores information about movies. Then we’ll import a bunch of data, to give you something to search on.

Defining a Collection

A collection in CipherStash is a named group of sources that share a common purpose. Typically, these sources will all have the same (or similar) fields, and will be queried in the same way -- and therefore will have the same indexes. The description of source fields and indexes is defined when you create the collection, by means of a schema.

CipherStash itself does not see any of this information -- the collection name, the fields and their types, and the index definitions are all encrypted before being sent to CipherStash to be stored. Only clients that have access to the decryption key can see anything useful about the collection.

For our movies collection, we’ll imaginatively call it “movies”, and we’ll store the movie’s title, year of release, and running time (in minutes). To facilitate searching for movies based on certain criteria, we’ll need the following indexes:

  • exactTitle, so we can find a movie source if we know it’s precise title (eg “find the movie named ‘Star Trek II: The Wrath of Khan’”);
  • matchTitle, so we can lookup movies using partial string matches, when we can only remember part of a movie’s title (eg “find movies with ‘Star Trek’ in their name);
  • year, so we can find movies made in a given year, or movies made over a range of years (eg “find all the movies made between 1980 and 1989”);
  • runningTime, so we can find movies whose running time is in a certain range (eg “find me all movies that run for at least four hours”).

CipherStash defines collection schemas using a specially-structured JSON object, which describes the source fields (and their types), and the indexes. For our collection of movies, with the fields and indexes listed above, the schema definition looks like this:

{
  "type": {
    "title": string,
    "year": number,
    "runningTime": number
  },
  "indexes": [
    "exactTitle": { "kind": "exact", "field": "title" },
    "matchTitle": {
      "kind": "fullText",
      "fields": ["title"],
      "tokenFilters": [
        { "kind": "downcase" },
        { "kind": "ngram", "tokenLength": 3 }
      ],
      "tokenizer": { "kind": "standard" }
    },
    "runningTime": { "kind": "range", "field": "runningTime" },
    "year": { "kind": "range", "field": "year" }
  ]
}

Copy that into a local file somewhere, say movies_schema.json, ready for the next step. If you’d like to know exactly what the above is doing, the CipherStash schema definition reference contains full details.

Creating the Collection

You create collections using stash, the CipherStash CLI. Creating our movies collection looks like this:

stash collection create --name movies --schema movies_schema.json

After a brief pause to talk to CipherStash, this command should return with the helpful response “Collection ‘movies’ created.”. That means everything worked perfectly.

Of course, an empty collection is not very useful. Thus, the next step is to import some sources, and run some test queries.