CipherStash
CipherStash Documentation

Querying on a Field-Dynamic Match Index

The field-dynamic match index is a bit of a tricky one to wrap your head around. It indexes all string fields in your records, like the dynamic match index, but it also remembers which field the text is in. To query it, you need to specify both the text to match and the field in which the text should appear.

This is most easily demonstrated with an example.

Let’s say you create a collection of people, with this schema:

stash.create_collection("people", {
  "types" => {},
  "indexes" => {
    "allText" => {
      "kind" => "dynamic-match",
      "tokenizer" => "standard",
      "tokenFilters" => [
        { "kind" => "ngram", "tokenLength" => 3 }
      ]
    },
    "allTextWithFields" => {
      "kind" => "field-dynamic-match",
      "tokenizer" => "standard",
      "tokenFilters" => [
        { "kind" => "ngram", "tokenLength" => 3 }
      ]
    }
  }
})

collection = stash.collection("people")

Then, you load in these records:

[
  {
    name: "Alice Angleton",
    address: "42 Ambling St",
  },
  {
    name: "Bob Blessington",
    address: "61 Bowery Ln",
  },
  {
    name: "Charlene Chaise",
    address: "197 Cryston Rd",
  },
  {
    name: "David Drury",
    address: "8 Downer Cl",
    notes: "Doesn't like Bob",
  },
].each { |r| collection.insert(r) }

If we do a search for the string "Bob" in the allText index, we’ll get back two records:

collection.query { |p| p.allText.match("Bob") }.records.length  # => 2

Because the string "Bob" appears both in Bob’s name, as well as David’s notes.

If we definitely only wanted to search for someone named “Bob”, then we can use the allTextWithFields index, which is a field-dynamic-match index, and say “just search in the name field”. Like this:

collection.query { |p| p.allTextWithFields.match("name", "Bob") }.records.length  # => 1

Notice that the match operator for a field-dynamic-match index takes two arguments: the field name and the string to search for. This is unusual for a constraint operator, so it’s worth keeping in mind.

Field-Dynamic Match vs Match Indexes

Given how powerful field-dynamic match indexes are, you might be wondering why we would ever use the match index type. Why don’t we just use field-dynamic indexes all the time?

There are a couple of reasons why a match index is usually preferred over a field-dynamic match index:

  • A match index can search across more than one field simultaneously. When you define a match index, you can list multiple fields to index, with the fields parameter. Queries will then automatically apply across all those fields every time. With a field-dynamic-match index, you can only search on one field at a time.

  • A field-dynamic-match index is slower to update and query. Since all text in the record is indexed, all those terms need to be generated, encrypted, and sent to the server. That then means the index itself is larger, and thus slower to query than a smaller index.

When you know which field(s) you wish to query at the time the collection is created, you should always prefer the match index type. Only use the field-dynamic-match index type when you need to figure out what field to match against at runtime.