CipherStash
CipherStash Documentation

Create Your First Collection

All records in CipherStash are stored in a collection. You can think of a collection as loosely analogous to a table in a relational database, and records like rows.

For this tutorial, we’re going to create a collection that stores information about movies. Then we’ll import a bunch of data, to give you something to search on.

Creating the Collection

Collections are created using the stash create-collection command from the CipherStash CLI.

The command expects a collection name as well as a schema definition that describes the indexes and field types that are to be stored.

We’ve prepared some example data for your first collection. You can download its example schema from here, or by running the following command:

curl -Lo movies.schema.json https://cipherstash.com/examples/movies.schema.json

With the file downloaded, let’s use movies.schema.json to create a movies collection:

stash create-collection movies --schema movies.schema.json

After a brief pause to talk to CipherStash, this command should return with the response “Collection ‘movies’ created.”. That means everything worked perfectly.

Of course, an empty collection is not very useful. Thus, the next step is to import some records.

Or, if you’d like to know more about collection schemas, read on.

Defining a Collection

A collection in CipherStash is a named group of records that share a common purpose. Typically, these records will all have the same (or similar) fields, and will be queried in the same way -- and therefore will have the same indexes. The description of record fields and indexes is defined when you create the collection, by means of a schema.

CipherStash itself does not see any of this information -- the collection name, the fields and their types, and the index definitions are all encrypted before being sent to CipherStash to be stored. Only clients that have access to the decryption key can see anything useful about the collection.

For the movies collection we created above, we’ll define the types of the movie’s title, year of release, and running time (in minutes). Additional fields can be stored in the record, however the CipherStash client won’t do any type checking on that data.

To facilitate searching for movies based on certain criteria, we’ll also create the following indexes:

  • exactTitle, so we can find a movie record if we know it’s precise title (eg “find the movie named ‘Star Trek II: The Wrath of Khan’”);
  • matchTitle, so we can lookup movies using partial string matches, when we can only remember part of a movie’s title (eg “find all the movies with ‘Star Trek’ in their name);
  • year, so we can find movies made in a given year, or movies made over a range of years (eg “find all the movies made between 1980 and 1989”);
  • runningTime, so we can find movies whose running time is in a certain range (eg “find me all movies that run for at least four hours”).

CipherStash defines collection schemas using a specially-structured JSON object, which describes the record fields (and their types), and the indexes. For our collection of movies, with the fields and indexes listed above, the schema definition looks like this:

{
  "type": {
    "title": "string",
    "year": "uint64",
    "runningTime": "uint64"
  },
  "indexes": {
    "exactTitle": { "kind": "exact", "field": "title" },
    "matchTitle": {
      "kind": "match",
      "fields": ["title"],
      "tokenFilters": [
        { "kind": "downcase" },
        { "kind": "ngram", "tokenLength": 3 }
      ],
      "tokenizer": { "kind": "standard" }
    },
    "runningTime": { "kind": "range", "field": "runningTime" },
    "year": { "kind": "range", "field": "year" }
  }
}

If you’d like to know all the gory details of what the above JSON means, the CipherStash schema definition reference will explain everything. Otherwise, it’s now time to import some records into our collection.