Documentation

Quick Start

Prerequisites

  • Rust 1.70+ (install via rustup)

Build and Run

# Clone the repository
git clone https://github.com/overclockdb/overclockdb.git
cd overclockdb

# Build in release mode
cargo build --release

# Run the server
cargo run --release

The server starts on http://localhost:8108 by default.

Environment Variables

Variable Default Description
OVERCLOCKDB_PORT 8108 HTTP server port
OVERCLOCKDB_DATA_DIR ./data Directory for WAL and snapshots
OVERCLOCKDB_PERSISTENCE true Enable/disable persistence

PostgreSQL Sync Variables

Variable Required Description
POSTGRES_URL Yes PostgreSQL connection string
POSTGRES_TABLE Yes Source table name
POSTGRES_COLLECTION Yes Target OverclockDB collection
POSTGRES_PRIMARY_KEY No Primary key column (default: id)
POSTGRES_BATCH_SIZE No Batch size for initial sync (default: 1000)

Health Check

# Liveness check
GET /health

# Readiness check
GET /ready

Collections

# Create a collection
POST /api/v1/collections
Content-Type: application/json

{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "description", "type": "string"},
    {"name": "price", "type": "float", "sort": true},
    {"name": "category", "type": "string", "facet": true}
  ],
  "enable_stemming": true,
  "stem_language": "english",
  "enable_stop_words": true,
  "stop_words_language": "english"
}

# List all collections
GET /api/v1/collections

# Get collection info
GET /api/v1/collections/:name

# Delete a collection
DELETE /api/v1/collections/:name

Documents

# Create/update a document
POST /api/v1/collections/:name/docs
Content-Type: application/json

{
  "id": "prod_1",
  "title": "Wireless Headphones",
  "description": "Premium noise-canceling headphones",
  "price": 299.99,
  "category": "electronics"
}

# Batch import documents
POST /api/v1/collections/:name/docs/batch
Content-Type: application/json

{
  "documents": [
    {"id": "1", "title": "Product 1", "price": 99.99},
    {"id": "2", "title": "Product 2", "price": 149.99}
  ]
}

# Get a document by ID
GET /api/v1/collections/:name/docs/:id

# Replace a document
PUT /api/v1/collections/:name/docs/:id

# Delete a document
DELETE /api/v1/collections/:name/docs/:id

Field Types

Type Description
stringText field (indexed by default)
string[]Array of strings (filterable, facetable)
int3232-bit integer
int32[]Array of 32-bit integers (filterable)
int6464-bit integer
int64[]Array of 64-bit integers (filterable)
float64-bit floating point
float[]Array of 64-bit floats (filterable)
boolBoolean
hierarchyHierarchical category with "/" separator
attributesDynamic key-value pairs for flexible faceting
Note: Numeric array filters use ANY semantics - a document matches if any element in the array satisfies the filter condition.

Field Options

Option Default Description
indextrueEnable full-text search
facetfalseEnable facet counting
sortfalseEnable sorting
optionalfalseAllow missing values

Collection Options

Option Default Description
enable_stemmingfalseEnable word stemming
stem_language"english"Stemming language: "english" or "russian"
enable_stop_wordsfalseFilter common words
stop_words_language"english""english", "russian", or "none"
enable_vectorsfalseEnable semantic/vector search
vector_fieldsall textFields to generate embeddings from
num_shardsnullNumber of shards for parallel processing

Filtering

field:=value       # Equals
field:!=value      # Not equals
field:>value       # Greater than
field:>=value      # Greater than or equal
field:=100 AND category:^Electronics

Typo Tolerance

OverclockDB supports typo-tolerant search using the SymSpell algorithm:

POST /api/v1/collections/products/search
{
  "q": "laptp",
  "typo_tolerance": 2
}

With typo_tolerance: 2, the query "laptp" will match documents containing "laptop".

Value Description
0 or nullDisabled (exact matching only)
1Allow 1 character difference
2Allow up to 2 character differences

Stemming

Stemming reduces words to their root form for better recall:

  • "running", "runs", "ran" → "run"
  • "computers", "computing" → "comput"
POST /api/v1/collections
{
  "name": "articles",
  "fields": [{"name": "content", "type": "string"}],
  "enable_stemming": true,
  "stem_language": "english"
}
Language API Value Notes
English"english"Porter/Snowball stemmer (default)
Russian"russian"Snowball Russian stemmer

Stop Words

Stop words are common words like "the", "a", "is" that are filtered out:

POST /api/v1/collections
{
  "name": "articles",
  "fields": [{"name": "content", "type": "string"}],
  "enable_stop_words": true,
  "stop_words_language": "english"
}
Language API Value Notes
English"english"~120 common English words
Russian"russian"~70 common Russian words
None"none"Disable stop words filtering

Hierarchical Categories

The hierarchy field type enables tree-structured categories with automatic ancestor indexing.

Schema Definition

POST /api/v1/collections
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "category", "type": "hierarchy", "facet": true},
    {"name": "price", "type": "float", "sort": true}
  ]
}

Document Examples

// Single category
{"id": "laptop-1", "title": "MacBook Pro", "category": "Electronics/Computers/Laptops"}

// Multiple categories
{"id": "organizer-1", "title": "Desk Organizer", "category": ["Office/Supplies", "Home/Storage"]}

Searching Hierarchies

# All electronics (matches Laptops, Desktops, Phones, etc.)
{"q": "*", "filter": "category:^Electronics", "facets": ["category"]}

# Only computers
{"q": "*", "filter": "category:^Electronics/Computers"}

# Multiple hierarchies (OR)
{"q": "*", "filter": "category:^[Electronics,Clothing]"}

Drill-Down Navigation

# Get children of Electronics
{"q": "*", "facets": ["category"], "hierarchy_parent": "Electronics"}

Response includes hierarchy_facets:

{
  "hierarchy_facets": {
    "category": [
      {"path": "Electronics/Computers", "name": "Computers", "count": 60, "depth": 1, "has_children": true},
      {"path": "Electronics/Phones", "name": "Phones", "count": 40, "depth": 1, "has_children": false}
    ]
  }
}

Internationalization (i18n)

OverclockDB supports translated facet labels for multilingual product catalogs.

Set Translations

PUT /api/v1/collections/products/translations
{
  "field": "category",
  "translations": [
    {
      "value": "Electronics",
      "labels": {"en": "Electronics", "es": "Electrónicos", "de": "Elektronik"}
    },
    {
      "value": "Electronics/Computers",
      "labels": {"en": "Computers", "es": "Computadoras", "de": "Computer"}
    }
  ]
}

Search with Language

POST /api/v1/collections/products/search
{
  "q": "laptop",
  "facets": ["category", "brand"],
  "language": "es"
}

Response includes translated labels:

{
  "facets": {
    "brand": [
      {"value": "apple", "label": "Apple Inc.", "count": 50}
    ]
  },
  "hierarchy_facets": {
    "category": [
      {"path": "Electronics", "label": "Electrónicos", "count": 100}
    ]
  }
}

Language Fallback

When a translation is not found:

  1. Requested language (e.g., "es")
  2. English ("en")
  3. Raw field value

Sharding

Hash-based document sharding for parallel search with 2-3x speedup.

Create a Sharded Collection

POST /api/v1/collections
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "brand", "type": "string", "facet": true},
    {"name": "price", "type": "float"}
  ],
  "num_shards": 4
}

How It Works

  1. Document Routing: Documents distributed via hash of document ID
  2. Parallel Search: Queries run on all shards concurrently
  3. Result Merging: K-way heap merge with global BM25 statistics
  4. Facet Aggregation: Counts summed across all shards

Performance

Collection Size Regular 4 Shards Speedup
50K documents949 µs352 µs2.7x faster
100K documents1.17 ms386 µs3x faster

Recommendations

Collection Size Recommended Shards
< 50K docs1 (no sharding)
50K - 500K4 shards
500K - 2M8 shards
> 2M docs8-16 shards

Context-Aware Overlays

Runtime value resolution for context-specific data like customer pricing:

POST /api/v1/collections/products/search
{
  "q": "laptop",
  "overlay": {
    "context_key": "customer_123",
    "base_field": "default_price",
    "strategy": "min"
  },
  "sort_by": "effective_value:asc"
}

Merge Strategies

Strategy Description
minUse minimum of base and overlay value (default)
overrideOverlay value replaces base if present
maxUse maximum of base and overlay value

PostgreSQL Synchronization

Sync data from PostgreSQL for ACID transactions with OverclockDB's fast search.

Setup

export POSTGRES_URL="postgres://user:pass@localhost/mydb"
export POSTGRES_TABLE="products"
export POSTGRES_COLLECTION="products"

Sync API Endpoints

# Get sync status
GET /api/v1/sync/status

# Incremental sync (upsert-based)
POST /api/v1/sync/initial

# Atomic sync (recommended - handles deletes)
POST /api/v1/sync/atomic

# Start polling-based sync
POST /api/v1/sync/start
{"interval_secs": 60}

# Stop sync
POST /api/v1/sync/stop

Sync Modes

Endpoint Description Handles Deletes
/sync/initialIncremental upsert syncNo
/sync/atomicAtomic swap sync (recommended)Yes
Note: PostgreSQL sync is one-way (PostgreSQL → OverclockDB). Data modifications should be made in PostgreSQL.