CipherStash Documentation

What CipherStash can and can’t do

CipherStash is a platform for securely storing and searching your data.

So you can make informed decisions about CipherStash, we want to be clear about what CipherStash can and cannot do.

As we continue to grow the CipherStash platform, you can expect the list of things CipherStash cannot do will get much shorter.

What the CipherStash platform can do

  • Basic data types (for storing PII)
    • Strings
    • Numbers (Int64, Float64)
    • Dates (and times)
    • Booleans
    • Any compound type composed from the basic data types above
  • Queries
    • Index traits
      • Exact (on all core types)
      • Range (on all core types except Strings and Booleans)
      • Match (only Strings, full text search, typeahead, ngram)
      • Any combination of above index types
    • Aggregations
      • Count
    • Search operations
      • Querying of nested objects with string values (for example {"level1”":{"level2":{"level3": "value"}}})
  • Reliability
    • Immediately consistent
    • Manual migrations between CipherStash versions
  • Security
    • Base level security promises
      • All data encrypted client-side under AES
      • Storing the left and right ciphertext
      • Indexes may yield some information
      • Thread safe nonces
      • Reasonable defence against an attacker
    • Manual key rotation
  • Connectivity
    • Every customer has their own instance of CipherStash
      • Public endpoints under
    • VPC Peering in AWS‌
  • Performance
    • 1,000 inserts / second, sustained
    • For datasets up to 100M records

What the CipherStash platform cannot do now, but will be able to do in the future

  • More complex data types
    • UUID
    • IPv4 + IPv6 addresses
  • Queries
    • Index traits
      • Match (edge ngrams)
    • Aggregations
      • Min
      • Max
      • Mean
      • Sum
    • Search operations
      • Select a subset of fields
      • Joins
      • Window functions
      • Querying of nested objects with non-string values (for example {"level1”":{"level2":{"level3": [ false, 1 ]}}})
      • Publish—subscribe queries
  • Consistency
    • Unique constraints, beyond UUID
  • Reliability
    • Data migrations (adding or removing fields and indexes)
    • Automated migrations between CipherStash versions
    • Zero downtime deploys on upgrades
    • Replication (for data durability, then high availability)
    • Be Jepsen tested
  • Security
    • Snapshot security
    • Automated key rotation
  • Connectivity
    • Cut-down local instance of CipherStash, for use in local dev and in CI
    • Accessible via AWS PrivateLink

Areas of active research

These are subjects we are actively researching how to solve.

We are not yet sure if it’s possible to implement efficiently in an encrypted data store.

  • Queries
    • Aggregations
      • Mode
      • Standard deviation
      • Percentiles
    • Search operations
      • Ranked search (cosine similarity)
      • Geospatial
  • Security
    • Full Homomorphic Encryption on source record​

What the CipherStash platform will never be able to do