Searchable encryption index types and the query operations they enable on encrypted data

Supported queries

CipherStash provides three core index types, each optimized for specific use cases:

Index Type	Description	Example Operators
Exact	Equality comparisons and unique constraints	`=`, `!=`, `IN`, `NOT IN`
Match	Substring and pattern matching	`LIKE`, `ILIKE`, `~~`, `~~*`
Range	Comparison operations and ordering	`<`, `<=`, `>`, `>=`, `BETWEEN`, `ORDER BY`

You can configure multiple index types on the same field to support different query patterns. For example, a field might have both Exact and Match indexes to support both equality checks and substring searches.

Choosing index types

When designing your schema, consider what queries you'll need to perform:

Query pattern	Required index type
`WHERE email = 'user@example.com'`	Exact
`WHERE name LIKE '%Smith%'`	Match
`WHERE price < 100`	Range
`WHERE status IN ('active', 'pending')`	Exact
`ORDER BY created_at DESC`	Range
`UNIQUE` constraint	Exact or Range

CipherCell storage

Each index type stores its encrypted search terms in a specific key within the CipherCell structure:

hm: Exact index terms (HMAC)
bf: Match index terms (Bloom filter)
ob: Range/Order index terms (ORE blocks)

A single encrypted field can contain multiple index types in its CipherCell, allowing it to support multiple query patterns simultaneously.

Exact match

Exact indexes use HMAC-SHA256 to enable equality comparisons on encrypted data.

Supported operators

`=` — Equality

Returns true if both operands are equal.

SELECT *
FROM users
WHERE email = $1::eql_v2_encrypted;

`!=` — Negated equality

Returns true if operands are not equal.

SELECT *
FROM users
WHERE email != $1::eql_v2_encrypted;

`IN` — Array comparison

Returns true if the left-hand expression's result is equal to any of the right-hand expressions.

SELECT *
FROM users
WHERE state IN ($1::eql_v2_encrypted, $2::eql_v2_encrypted);

NOT IN, ANY and SOME are also supported.

`JOIN` — Table joins on encrypted columns

SELECT
  orders.id,
  orders.amount,
  users.name
FROM orders
JOIN users ON orders.user_email = users.email
WHERE orders.amount > 100;

`GROUP BY` — Grouping by encrypted columns

SELECT
  state,
  COUNT(*) as user_count
FROM users
GROUP BY state;

CipherCell key

Exact indexes are stored in the hm key of a CipherCell.

Database indexes

PostgreSQL B-tree indexes work with Exact.

CREATE INDEX
ON users (
    encrypted_column eql_v2.encrypted_operator_class
);

Unique indexes

UNIQUE indexes require either Exact or Range searchable encrypted indexes to be available on the CipherCell.

CREATE UNIQUE INDEX
ON users (
    encrypted_column eql_v2.encrypted_operator_class
);

Cryptography

Exact terms are generated using HMAC, specifically HMAC-SHA-256 which uses SHA with an output size of 32-bytes as the underlying hash function.

Match (pattern)

Match indexes use trigrams (three-character sequences) to enable substring matching on encrypted data. When a value is encrypted, trigrams are extracted and stored in an encrypted Bloom filter, allowing for efficient pattern matching without decrypting the data.

Trigrams work best for matching patterns that are at least 3 characters long. Very short search patterns may produce more false positives.

For example, the string "laptop" produces the following trigrams:

l (note the leading spaces)
la
lap
apt
pto
top
op (note the trailing space)

When searching for "apt" in an encrypted field, the system checks if the encrypted Bloom filter contains the apt trigram.

Text processing pipeline

Before text is indexed in an encrypted Bloom filter, it passes through a configurable processing pipeline on the client side. This pipeline determines how text is tokenized and normalized, which directly affects query behavior.

Tokenizers

Tokenizers break text into searchable units:

Standard (Unicode whitespace): Splits text on whitespace boundaries.

"Hello World" → ["Hello", "World"]

Ngram (default: trigram): Generates character n-grams from text.

"laptop" → ["  l", " la", "lap", "apt", "pto", "top", "op "]

EdgeGram: Generates n-grams from the beginning of text, useful for prefix matching. Accepts min and max gram size.

"search" with min=1, max=6 → ["s", "se", "sea", "sear", "searc", "search"]

Token filters

Token filters normalize tokens after tokenization:

Upcase: Converts all text to uppercase
Downcase: Converts all text to lowercase
Stemmer: Reduces words to their root form ("running" → "run")
Stop Word Removal: Filters out common words like "a", "an", "the", "is", "it", etc.

Pipeline impact on queries

If a downcase token filter is configured in the pipeline, both LIKE and ILIKE will perform case-insensitive searches. The distinction between LIKE (case-sensitive) and ILIKE (case-insensitive) only applies when no case-normalizing filters are used.

Supported operators

`LIKE` — Pattern matching

SELECT *
FROM products
WHERE name LIKE $1::eql_v2_encrypted;

`ILIKE` — Case-insensitive pattern matching

SELECT *
FROM products
WHERE description ILIKE $1::eql_v2_encrypted;

`~~` — Equivalent to `LIKE`

SELECT *
FROM documents
WHERE title ~~ $1::eql_v2_encrypted;

`~~*` — Equivalent to `ILIKE`

SELECT *
FROM documents
WHERE title ~~* $1::eql_v2_encrypted;

LIKE and ILIKE mapping

Encrypted Bloom filters don't directly implement the SQL LIKE or ILIKE operators. Instead, they perform token-based matching using the configured text processing pipeline.

CipherStash Proxy and the Encryption SDK automatically map LIKE and ILIKE queries to encrypted Bloom filter searches.

This mapping works by:

Extracting the search pattern from the LIKE expression (ignoring % wildcards)
Processing the pattern through the same text pipeline (tokenizer + filters)
Checking if the resulting tokens exist in the encrypted Bloom filter

Limitations

Very short patterns (1-2 characters): Patterns shorter than 3 characters may produce different results because trigrams require at least 3 characters
Wildcard-only patterns: Patterns consisting only of wildcards cannot be mapped
Character class patterns: SQL LIKE supports _ (single character) wildcards, but encrypted Bloom filters don't
Position-specific matching: While prefix and suffix searches generally work, the behavior is trigram-based rather than position-exact

CipherCell key

Match indexes are stored in the bf key of a CipherCell.

Database indexes

PostgreSQL GIN (Generalized Inverted Index) indexes work with Match.

CREATE INDEX
products_encrypted_gin_index
ON products USING GIN (
    eql_v2.bloom_filter(encrypted_column)
);

Match indexes use GIN rather than B-tree indexes. GIN indexes are specifically designed for indexing composite values and are ideal for full-text and pattern matching operations.

UNIQUE indexes are not supported for Match indexes, as Bloom filters are probabilistic data structures that cannot guarantee uniqueness. Use Exact or Range indexes if you need unique constraints.

Cryptography

Match terms are generated using encrypted Bloom filters, which enable boolean retrieval operations over encrypted data. The system extracts trigrams from the plaintext, encrypts each trigram using a keyed pseudorandom function, and stores them in a Bloom filter structure.

Academic references

Cryptographically Secure Bloom-Filters (Nojima, Kadobayashi 2009)
A New Bloom Filter Structure for Searchable Encryption Schemes (Chum, Zhang 2017)

Range / Order

Range indexes use Order Revealing Encryption (ORE), which allows comparison operations on encrypted data while preserving order relationships.

Supported operators

`<` — Less than

SELECT *
FROM products
WHERE price < $1::eql_v2_encrypted;

`<=` — Less than or equal

SELECT *
FROM products
WHERE price <= $1::eql_v2_encrypted;

`>` — Greater than

SELECT *
FROM orders
WHERE order_date > $1::eql_v2_encrypted;

`>=` — Greater than or equal

SELECT *
FROM orders
WHERE order_date >= $1::eql_v2_encrypted;

`BETWEEN` — Range comparison

SELECT *
FROM products
WHERE price BETWEEN $1::eql_v2_encrypted AND $2::eql_v2_encrypted;

`ORDER BY` — Sort results

If applied to a string type, ordering is applied lexicographically (i.e. ordered by comparing elements left-to-right, like words in a dictionary).

SELECT *
FROM products
ORDER BY price ASC;

CipherCell key

Order / Range indexes are stored in the ob key of a CipherCell.

Database indexes

PostgreSQL B-tree indexes work with Order / Range.

CREATE INDEX
ON products (
    encrypted_column eql_v2.encrypted_operator_class
);

UNIQUE indexes require either Exact or Range searchable encrypted indexes to be available on the CipherCell.

Cryptography

Order / Range terms are generated using Order Revealing Encryption (ORE), which allows comparison operations on encrypted data while preserving order relationships. This enables efficient range queries and sorting on encrypted columns without decrypting the data.

Academic references

The specific scheme used is based on Block ORE (Lewi-Wu 2016) and incorporates security enhancements identified by Bogatov et al 2018. The implementation is available on GitHub.

Supported queries

On this page