CipherStash
CipherStash Documentation

Match Query

A match query does a full text search.

In this example the schema collection indexed a match type on the title field.

This allows us to use match queries on title.

import {
  Stash,
  StashRecord,
} from "@cipherstash/stashjs"

interface Movie extends StashRecord {
  title: string;
  runningTime: number;
  year: number;
};

const matchQuery = async () => {
    try {
        const stash = await Stash.connect()
        const movies = await stash.loadCollection<Movie>("movies")

        const queryResult = await movies.query(movie => movie.title.match("odyssey"))

        console.log(queryResult)
    } catch (err) {
        console.error(`Could not query collection! Reason: ${JSON.stringify(err)}`)
    }
}

matchQuery()

The above query returns the below result

{
  took: 0.143832083,
  documents: [
    {
      titleType: 'movie',
      primaryTitle: 'The Odyssey',
      originalTitle: 'Al-Oudyssa',
      startYear: 2004,
      runtimeMinutes: 92,
      genres: 'Crime,Drama,Thriller',
      year: 2004,
      title: 'The Odyssey',
      runningTime: 92
    }
  ],
  aggregates: []
}

Match query will also return records based on partial words.

import {
  Stash,
  StashRecord,
} from "@cipherstash/stashjs"

interface Movie extends StashRecord {
  title: string;
  runningTime: number;
  year: number;
};

const matchQuery = async () => {
    try {
        const stash = await Stash.connect()
        const movies = await stash.loadCollection<Movie>("movies")

        const queryResult = await movies.query(movie => movie.title.match("ody"))

        console.log(queryResult)
    } catch (err) {
        console.error(`Could not query collection! Reason: ${JSON.stringify(err)}`)
    }
}

matchQuery()

This query matches on 2 records

{
  took: 0.128204,
  documents: [
    {
      titleType: 'video',
      primaryTitle: 'A Bloody Show: John Wesley Harding & Friends Live at Bumbershoot 2005',
      originalTitle: 'A Bloody Show: John Wesley Harding & Friends Live at Bumbershoot 2005',
      startYear: 2006,
      runtimeMinutes: 89,
      genres: 'Music',
      year: 2006,
      title: 'A Bloody Show: John Wesley Harding & Friends Live at Bumbershoot 2005',
      runningTime: 89
    },
    {
      titleType: 'movie',
      primaryTitle: 'The Odyssey',
      originalTitle: 'Al-Oudyssa',
      startYear: 2004,
      runtimeMinutes: 92,
      genres: 'Crime,Drama,Thriller',
      year: 2004,
      title: 'The Odyssey',
      runningTime: 92
    }
  ],
  aggregates: []
}

The reason for this is because when this schema was defined, the title field was indexed with a standard tokenizer and an ngram filter.

Fuzzy matching

When doing a full text search, the records returned will include the terms that are search for.

Although due to the way we implement full text search, the result can include additional records that may not contain the terms that were search for, but will instead happen to have all the ngram tokens for the text.

Consider the below example:

The field title has been indexed with the below options:

{
  "title": {
  "kind": "match",
  "fields": ["title"],
  "tokenFilters": [
    { "kind": "downcase" }
  ],
  "tokenizer":  { "kind": "ngram", "tokenLength": 3 }
}

Using the match query:

const queryResult = await movies.query(
    (movie) => movie.title.match("there")
);

This will return the below records:

{
  took: 0.087420167,
  documents: [
    {
      titleType: 'movie',
      primaryTitle: 'There she goes',
      originalTitle: 'There she goes',
      startYear: 2019,
      runtimeMinutes: 120,
      genres: 'Comedy',
      year: 2019,
      title: 'There she goes',
      runningTime: 120
    },
    {
      titleType: 'movie',
      primaryTitle: 'The Heretics',
      originalTitle: 'The Heretics',
      startYear: 2021,
      runtimeMinutes: 120,
      genres: 'Drama',
      year: 2021,
      title: 'The Heretics',
      runningTime: 120
    }
  ],
  aggregates: []
}

We get 2 records returned, the first has a match on the search term there.

The second record returned matches because it happens to contain tokens that are similar to the search term.

“The Heretics” when indexed with a ngram tokenizer becomes ["the", "he ", "e h", "her", "ere",.......etc].