@aicacia/local-embeddings

ts-local-embeddings

Browser-first local embedding utilities for:

Loading a local ONNX embedding model with fallbacks
Embedding documents in-process or via Web Worker
Storing vectors in IndexedDB with similarity search and MMR

Documentation

Hosted API docs: https://aicacia.github.io/ts-local-embeddings/docs/

Install

pnpm add @aicacia/local-embeddings @huggingface/transformers @langchain/core

Quick start (Web Worker + IndexedDB)

import {
  IndexedDBVectorStore,
  WorkerEmbeddings,
} from "@aicacia/local-embeddings";
import { Document } from "@langchain/core/documents";

const embeddings = new WorkerEmbeddings({
  runtime: {
    modelPath: "/models/",
  },
});

const store = new IndexedDBVectorStore(embeddings);

await store.addDocuments([
  new Document({ pageContent: "TypeScript is strongly typed JavaScript" }),
  new Document({
    pageContent: "Transformers can run in the browser with ONNX",
  }),
]);

const matches = await store.similaritySearchWithScore("browser embeddings", 3);
console.log(matches);

When finished, terminate worker resources:

embeddings.terminate();

Runtime options

loadEmbeddingRuntime and WorkerEmbeddings runtime options support:

modelId: model repo id (default: onnx-community/embeddinggemma-300m-ONNX)
modelFallbacks: preferred dtype / file fallback order
allowRemoteModels: when false, require local files only
modelPath: path passed to transformers as cache_dir; in browser apps this is typically a served model base path like '/models/' or '/my-base/models/'

WorkerEmbeddings also supports:

onProgress: callback fired during embedDocuments batching with processedAfterBatch and totalDocuments values for UI progress indicators.

To enable internal debug logs while diagnosing runtime/model loading issues, set:

(
  globalThis as { __LOCAL_EMBEDDINGS_DEBUG__?: boolean }
).__LOCAL_EMBEDDINGS_DEBUG__ = true;

Example modelFallbacks:

const embeddings = new WorkerEmbeddings({
  runtime: {
    modelFallbacks: [
      { dtype: "q4", model_file_name: "model_no_gather" },
      { dtype: "q4" },
      { dtype: "fp16" },
    ],
  },
});

Architecture

The library now uses deep internal boundary modules while keeping the public API unchanged:

embeddingPipeline: owns token limit resolution, adaptive batching, tokenizer/model invocation compatibility, and embedding output validation.
workerChannel: owns worker request ids, pending promise lifecycle, timeout handling, and failure fan-out semantics.
vectorWritePipeline: owns write-path dedup policy, cache reuse, deterministic guard behavior, and record mapping.
indexedDbStoreGateway: owns IndexedDB lifecycle/open-upgrade flow, schema checks, and low-level read/write/query operations.
runtimePolicy + runtimeLoader: separates fallback-selection policy from the Hugging Face runtime adapter.

Local model files

If you run fully local, host model assets under your configured modelPath, typically:

<public-or-static-root>/models/onnx-community/<model-id>/onnx/*

For SvelteKit, files in static/ are served at the app base path.

Development

pnpm install
pnpm build
pnpm lint
pnpm coverage
pnpm github-pages:dev

For running the browser benchmark locally, build the browser assets first:

pnpm run build:browser
pnpm run benchmark:browser