All data in CipherStash exist inside Collections. You can think of these as loosely analogous to a table in a database or an index in a search system. For example, you might create a collection to store the users in your application. Collections might also be used to store application logs, insurance claims, sales orders or just about anything else you can imagine.
Defining a Collection
Let's imagine we want to create a collection to store company employees. The collection stores records as in the form:
In CipherStash, we call a record the source. It is encrypted in the client before being sent to the server and CipherStash never sees any plain-text data. Records are encoded as BSON before being encrypted.
To create a collection for our employees, we first need to create a CollectionSchema by calling CollectionSchema.defineBasic(collectionName) then pass the schema createCollection(schema) function. Note that the collection name will be encrypted so that it is hidden from CipherStash.
We can insert records into our collection and they will be fully encrypted in CipherStash. However, aCollection with a schema created via `notIndexed()` only supports retrieval of records by ID. Such a collection behaves like a simple key-value store. In order to perform useful queries on the data we must define our collection schema with indexes.
While traditional databases use indexes to improve query performance, CipherStash uses indexes to enable queries over an encrypted collection. You don't need to create an index for every field (in fact a collection can work as a simple key-value store without any indexes at all) but queries on a field are not possible without an index defined.
Let's have another go at creating our collection, but this time with some useful indexes.
If you are following along and need to delete the collection we created earlier before creating it again, you can do so using deleteCollection.
Defining indices with field mappings
Let's breakdown what we've done so far. In the example code above we defined a CollectionSchema named "employees" and we've also defined some mappings on the fields by calling indexedWith and providing a callback function as an argument. The return value of the callback is an object that describes the indices for our record type. Indices define how we can query our collection.
The keys of the object (in the example above, email, employment, active, salary, startDate, nameAndJobTitle) are the names of the indices. The values of the object define the type of the index which determines the supported query operations.
For example, mapping.Exact("email") defines an index that provides the ability to use an equality clause in queries on that index. mapping.Range("salary") defines the ability to use a comparison clause in a query - e.g. less than, greater than, between etc.
There is also another kind of mapping: Match. An index created with Match can perform full text search queries. The following fragment, for example, enables full text search across two fields (name and jobTitle) first by normalising the text with a downcase filter and then analyzing the text with an ngram tokenizer.
A Match filter with this configuration implements "typeahead". A typeahead is a common pattern when interacting with data in forms.
Match indices in CipherStash work on string fields and allow for free text searches in a similar way to SQL's like. They use a fast encrypted B-Tree and support large collections (using n-grams and boolean search under the hood). If don't need to do partial string matches, you can just use an Exact index which only matches exact strings.
For a full list of supported index types see Index Types.
Adding Indexes to a Collection
At the moment it isn't possible to add a new index to a collection (you must define all the indexes you need before adding data to the collection). This is something that will be addressed in a future version. However, you can re-index records one at a time if you need (say if you have changed the collection settings). See Put, Get and Delete Records.
Every Collection must have a primary key: it can be a field you have defined in a source record, or it can be automatically generated for you. We'll have CipherStash generate records for us.
Generated IDs are 128-bit Universally Unique Identifiers (UUID). This is in contrast to integer sequences which are common in traditional data stores and are prone to leaking information about the data (such as insertion order).
If we don't provide an ID, the CipherStash client will generate one for us when we insert a record using put.
The returned record will have the auto-generated ID set:
ada.id // 'd8e54fde-3925-429f-bb56-1788dfc59773'
ada.id // 'd8e54fde-3925-429f-bb56-1788dfc59773'
Other Primary Keys
If you have a field in your source record that you want to use for a primary key, you can do so by first converting it into a uniformly random number. Here we'll do that by using the HMAC function in the node.js API.