A dead simple vector database that lives on your GPU. Now with a sidecar payload field — every record is (vector, label, ≤100KB blob). The MySQL sidekick is gone; one exe replaces both.
Store millions of embeddings in VRAM. Find the nearest ones in milliseconds. Single exe, no dependencies, no configuration.
# Start a database
vec mydb 1024
# That's it. It's listening on port 1920.No GPU? No problem. vec-cpu does the same thing using RAM.
vec-cpu mydb 1024Upgrading from 1.x? See
MIGRATION.md. Server protocol is a clean break — old SDKs won't work..tensorsand.metafiles load unchanged;.datais a new sidecar that appears on first save once you store payloads.
Binary only. Every request and every response is a length-prefixed binary frame over TCP, named pipe, or Unix socket.
request: F0 <2B ns_len> [ns] <CMD> <2B label_len> [label] <4B body_len> [body]
response: <1B status> <4B body_len> [body] ; status 0=ok, 1=err
15 commands, one envelope. See PROTOCOL-2.0.md for the byte-exact spec or sdk/README.md for the quick reference.
Use the SDK libraries — there is no text interface.
- PUSH — store a vector, optionally with a label and a ≤100KB data payload (data requires label). Returns the slot index.
- QUERY — nearest-neighbor search; metric byte selects L2 or cosine.
- QID — like QUERY but the query is an existing stored vector (by index or label).
- GET — retrieve records by index, by label (may yield multiple), or batch by index list.
- SET_DATA / GET_DATA — manage the per-slot payload independently of the vector.
- UPDATE — overwrite a vector in place (label and data untouched).
- LABEL — set or clear a slot's label.
- DELETE — tombstone a slot (also frees its label and data).
- UNDO — remove the last PUSH (also frees its label and data).
- CLUSTER — DBSCAN over the full set.
- DISTINCT — farthest-point sampling (k most spread-out vectors).
- REPRESENT — one most-distinct member per cluster.
- INFO — database metadata (dim, count, format, CRC, name, protocol version).
- SAVE — flush to disk; returns saved count + CRC.
QUERY/QID/GET take a 1-byte shape mask that controls what each result record carries:
0x01vector0x02label0x04data0x07all three (default)
Skip what you don't need to keep responses lean.
- L2 (squared Euclidean) — "how far apart?" Use for vision models (DINOv2, ArcFace).
- Cosine — "looking the same direction?" Use for text models (BGE, MiniLM, CLIP).
QUERY and QID take a metric byte (0=L2 default, 1=cosine). CLUSTER/DISTINCT/REPRESENT take the same byte.
Run them behind a router:
# Start databases without TCP (pipe/socket only)
vec --notcp tools 1024
vec --notcp conversations 1024
# Route them all through one port
vec --route 1920Or let deploy mode do it all:
# Auto-discover all .tensors files
vec deploy
# Custom port
vec deploy 1920
# Explicit schema
vec --deploy=tools:1024,conversations:1024,faces:512:f16 1920The SDK sets a namespace on the client. The router strips it from the frame and forwards to the correct backend.
# Delete an entire database
vec face --delete
# Check file integrity (dry run)
vec mydb --check
# Repair a corrupt database
vec mydb --repair
# Auto-detect .tensors in current directory
vec
# Load a specific file
vec mydb.tensors
# fp16 mode — half the VRAM, double the capacity (GPU only)
vec mydb 1024:f16NVIDIA GeForce RTX 3060 (12.0 GB)
===================================================================
database mydb
format f32, 1024 dim
records 4.2m total, 4.1m active, 100.0k deleted
file size 16.1 GB
modified 12 minutes ago (14:20)
capacity 2.9m max, 1.7m remaining (58.6%)
checksum NOMITOPO (0xA3F291B7) ok
===================================================================
1024-dim vectors:
GPU (RTX 3060) CPU
10K ~0.2 ms ~2 ms
100K ~1.5 ms ~20 ms
1M ~14 ms ~200 ms
fp32, 1024 dimensions:
8 GB VRAM 1.9M vectors
12 GB VRAM 2.9M vectors
24 GB VRAM 5.8M vectors
Lower dimensions or fp16 = more capacity.
RTX 2000 series and newer (Turing, Ampere, Ada Lovelace). RTX 2060 through 4090, plus T4, A100, L40.
GTX 1000 series and older won't work. AMD/Intel GPUs won't work. Use vec-cpu instead.
# Windows
build.bat
# Linux
./build.shRequires NVIDIA CUDA Toolkit 12.x for vec. vec-cpu just needs a C++ compiler.
- Brute force. Every query scans every vector. Exact results, zero approximation.
- GPU top-K. Above 100K vectors, a CUDA kernel finds the top results on GPU.
- All RAM. Vectors in VRAM (or RAM for vec-cpu), labels and data alongside them. No mmap, no lazy paging — disk is touched only on startup load and explicit SAVE.
- Indices are permanent. Slot 42 is always slot 42. Deletes are tombstones.
- Labels are clean. ≤2048 bytes, no spaces, no
: * ? " < > | ,. URI-style paths likedocs/file.pdfare fine. - Data is opaque. ≤100KB per slot. VEC stores the bytes verbatim — sniff the mime on the client if you need to.
- Same file format across builds. vec and vec-cpu read/write the same
.tensors,.meta,.datafiles. - CRC32 on save. Pronounceable checksum word for eyeball integrity checks.
- Read-only mode. If the file isn't writable, queries work, writes are rejected.
- Disk space check. Saves skipped if insufficient space.
- File repair.
--checkverifies,--repairfixes.
.tensors [4B dim][4B count][4B deleted][1B fmt][count×1B alive][vectors][4B CRC32]
.meta [4B count][per slot: 4B len + label bytes]
.data [4B count][count×1B alive mask][per present slot: 4B len + bytes][4B CRC32]
.data is new in 2.0 and only created on save when at least one slot has a payload.
C++ (vec_client.h), Python (vec_client.py), Node.js (vec_client.js), Delphi (vec_client.pas).
All in sdk/. Quick protocol reference in sdk/README.md. Byte-exact spec in PROTOCOL-2.0.md.
DINOv2 (1024d), BGE-large (1024d), ArcFace (512d), MiniLM-L12 (384d).
Curated by @PsyChip - April 2026