Securing AI and RAG pipelines
Protect sensitive data in AI retrieval-augmented generation pipelines with encrypted vector storage and searchable encryption
Securing AI and RAG pipelines
Retrieval-Augmented Generation (RAG) pipelines commonly store sensitive documents alongside vector embeddings. Without encryption, this data is exposed at rest and during retrieval — creating a significant attack surface. CipherStash lets you encrypt sensitive content while preserving the ability to search and retrieve it.
The problem
RAG architectures typically store:
- Document chunks — the original text, often containing PII, financial data, or confidential business information
- Metadata — source references, user associations, access tags
- Vector embeddings — numeric representations used for similarity search
If any of this data is exfiltrated from the database, the plaintext content is immediately exposed. Encryption-at-rest does not help — the data is decrypted as soon as it's queried.
Encrypting RAG context data
Use the Encryption SDK to encrypt sensitive fields before storing them alongside your embeddings.
Define a schema for your documents
import { encryptedTable, encryptedColumn } from "@cipherstash/stack/schema"
export const documents = encryptedTable("documents", {
content: encryptedColumn("content")
.freeTextSearch(),
source: encryptedColumn("source")
.equality(),
userId: encryptedColumn("user_id")
.equality(),
})Encrypt before storage
import { Encryption } from "@cipherstash/stack"
import { documents } from "./schema"
const client = await Encryption({ schemas: [documents] })
async function ingestDocument(doc: { content: string; source: string; userId: string; embedding: number[] }) {
const encryptedContent = await client.encrypt(doc.content, {
column: documents.content,
table: documents,
})
const encryptedSource = await client.encrypt(doc.source, {
column: documents.source,
table: documents,
})
const encryptedUserId = await client.encrypt(doc.userId, {
column: documents.userId,
table: documents,
})
if (encryptedContent.failure || encryptedSource.failure || encryptedUserId.failure) {
throw new Error("Encryption failed")
}
// Store encrypted fields alongside the vector embedding
await db.query(
`INSERT INTO documents (content, source, user_id, embedding)
VALUES ($1::jsonb, $2::jsonb, $3::jsonb, $4)`,
[encryptedContent.data, encryptedSource.data, encryptedUserId.data, JSON.stringify(doc.embedding)]
)
}Decrypt retrieved context
After vector similarity search retrieves relevant documents, decrypt the content before passing it to the LLM:
async function retrieveContext(queryEmbedding: number[], topK: number = 5) {
// Vector similarity search returns encrypted rows
const results = await db.query(
`SELECT content, source FROM documents
ORDER BY embedding <-> $1
LIMIT $2`,
[JSON.stringify(queryEmbedding), topK]
)
// Decrypt the content for each result
const decryptedDocs = await Promise.all(
results.rows.map(async (row) => {
const content = await client.decrypt(row.content)
const source = await client.decrypt(row.source)
return {
content: content.failure ? null : content.data,
source: source.failure ? null : source.data,
}
})
)
return decryptedDocs.filter((doc) => doc.content !== null)
}Searchable encrypted retrieval
When you need to filter documents by metadata before or alongside vector search, use searchable encryption with EQL:
-- Find documents for a specific user using encrypted equality search
SELECT content, source, embedding
FROM documents
WHERE eql_v2.eq(user_id, $1)
ORDER BY embedding <-> $2
LIMIT 10;This combines encrypted metadata filtering with vector similarity — without ever decrypting the metadata in the database.
Benefits for AI pipelines
- Sensitive context stays encrypted — document chunks containing PII or confidential data are never stored in plaintext
- Compliance-ready — encrypted storage meets GDPR, HIPAA, and SOC2 requirements for data protection
- Selective decryption — only decrypt what the LLM needs, reducing exposure surface
- Audit trail — track who retrieved which documents and when using identity-aware encryption