MemTurbo

A Precog Labs App

Beta

Store, search, and retrieve agent memories with industry-leading vector compression. Cut storage costs by 95% while maintaining semantic accuracy.

MemTurbo uses TurboQuant compression to reduce vector storage by up to 95% while preserving semantic search quality. Built for AI agents that need persistent, searchable memory across conversations.

Go to Dashboard

How It Works

Three steps from raw text to searchable compressed memory

Step 1

Store Memories

Send text content via REST API, Python/TypeScript SDK, or MCP server. MemTurbo ingests it with metadata like agent ID, tags, and session context.

Step 2

Compress & Embed

TurboQuant generates embeddings and compresses vectors using Lloyd-Max quantization with QJL residual correction for 95% storage reduction.

Step 3

Search & Retrieve

Query memories with semantic search powered by pgvector HNSW indexing. Results are ranked by cosine similarity with scores, filtered by tags, agent, or session.

Features

Everything you need for AI agent memory management

TurboQuant Compression

Lloyd-Max quantization with QJL residual correction reduces vector storage by up to 95% while maintaining cosine similarity accuracy.

Semantic Search

Find memories by meaning, not just keywords. Cosine similarity search across compressed vectors with relevance scoring via pgvector HNSW.

Multi-Tenant Isolation

Row-level security ensures complete data isolation between organizations and projects. Built for production multi-tenancy from day one.

Agent Memory

Tag memories by agent, user, and session. Build persistent context for AI agents that remember across conversations and interactions.

Version History

Track memory changes over time with automatic versioning. Every update creates a new version, preserving the complete history.

REST API & SDKs

Full REST API with TypeScript and Python SDKs. Store, search, update, and delete memories programmatically with typed clients.

MCP Server

Model Context Protocol server for direct integration with AI assistants like Claude. Expose memories as tools for LLM agents to use natively.

Background Processing

Async embedding pipeline with configurable worker pools. Non-blocking ingestion means your API stays fast while heavy compute runs in background.

API Key Authentication

Simple, secure API key auth with per-key rate limiting. Create multiple keys with different permissions and rate limits per project.

What It Provides

A complete memory infrastructure for AI applications

Memory Storage

Persistent storage with metadata, tags, agent IDs, and session tracking

Vector Search

Semantic similarity search with cosine distance scoring via pgvector

95% Compression

TurboQuant 3-bit compression with PolarQuant and QJL residuals

Multi-Platform

REST API, TypeScript SDK, Python SDK, MCP server, and web dashboard

Simple, Transparent Pricing

Start free, upgrade when you need more storage or collaboration

Free

$0/mo

1,000 memories
Basic embedding model
REST API access

Get Started

ProPopular

$29/mo

50,000 memories
Advanced models (BGE-M3)
Priority embedding queue

Upgrade

Team

$49/user/mo

Unlimited memories
Organizations & RBAC
MCP server access

Start Trial

Enterprise

Custom

On-prem deployment
Bring your own embeddings
SLA & uptime guarantees

Contact Sales

View full plan comparison

Ready to add memory to your AI agents?

Get started with the dashboard or integrate directly via the REST API. No credit card required for the free tier.

Launch Dashboard