Documentation

Quick Start

Prerequisites

Rust 1.70+ (install via rustup)

Build and Run

# Clone the repository
git clone https://github.com/overclockdb/overclockdb.git
cd overclockdb

# Build in release mode
cargo build --release

# Run the server
cargo run --release

The server starts on http://localhost:8190 by default.

Environment Variables

Variable	Default	Description
`OVERCLOCKDB_PORT`	8190	HTTP server port
`OVERCLOCKDB_DATA_DIR`	./data	Directory for WAL and snapshots
`OVERCLOCKDB_PERSISTENCE`	true	Enable/disable persistence
`OVERCLOCKDB_BODY_LIMIT_MB`	50	Max request body size in MB (for batch imports)

Health Check

# Liveness check
GET /health

# Readiness check
GET /ready

Collections

# Create a collection
POST /api/v1/collections
Content-Type: application/json

{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "description", "type": "string"},
    {"name": "price", "type": "float", "sort": true},
    {"name": "category", "type": "string", "facet": true}
  ],
  "enable_stemming": true,
  "stem_language": "english",
  "enable_stop_words": true,
  "stop_words_language": "english"
}

# List all collections
GET /api/v1/collections

# Get collection info
GET /api/v1/collections/:name

# Delete a collection
DELETE /api/v1/collections/:name

Collection Management

Rename Collection

PUT /api/v1/collections/:name/rename
Content-Type: application/json

{
  "new_name": "products_v2"
}

Update Schema

Modify field options (index, facet, sort, optional) on existing fields:

PUT /api/v1/collections/:name/schema
Content-Type: application/json

{
  "field_modifications": [
    {"name": "price", "sort": true, "facet": true},
    {"name": "description", "index": false}
  ]
}

Response:
{
  "name": "products",
  "documents_reindexed": 5000,
  "message": "Schema updated and 5000 documents reindexed"
}

Reshard Collection

Redistribute documents across a new number of shards:

POST /api/v1/collections/:name/reshard
Content-Type: application/json

{
  "num_shards": 8
}

Note: Resharding requires exclusive access and temporarily blocks queries while documents are redistributed.

Documents

# Create/update a document
POST /api/v1/collections/:name/docs
Content-Type: application/json

{
  "id": "prod_1",
  "title": "Wireless Headphones",
  "description": "Premium noise-canceling headphones",
  "price": 299.99,
  "category": "electronics"
}

# Batch import documents
POST /api/v1/collections/:name/docs/batch
Content-Type: application/json

{
  "documents": [
    {"id": "1", "title": "Product 1", "price": 99.99},
    {"id": "2", "title": "Product 2", "price": 149.99}
  ]
}

# Batch import response
{
  "imported": 995,
  "errors": [
    {"index": 42, "error": "Missing required field 'title'"},
    {"index": 108, "error": "Invalid price value: expected float"}
  ]
}

# Get a document by ID
GET /api/v1/collections/:name/docs/:id

# Replace a document
PUT /api/v1/collections/:name/docs/:id

# Delete a document
DELETE /api/v1/collections/:name/docs/:id

Search

POST /api/v1/collections/:name/search
Content-Type: application/json

{
  "q": "wireless headphones",
  "query_by": ["title", "description"],
  "filter": "price:>=100 AND category:=electronics",
  "facets": ["category"],
  "sort_by": "price:asc",
  "limit": 20,
  "offset": 0
}

Search Parameters

Parameter	Type	Description
`q`	string	Search query
`query_by`	string[]	Fields to search in
`filter`	string	Filter expression
`facets`	string[]	Fields to compute facet counts
`sort_by`	string	Sort expression (e.g., price:asc,rating:desc)
`limit`	number	Maximum results (default: 10)
`offset`	number	Pagination offset (default: 0)
`typo_tolerance`	number	Enable typo tolerance with max edit distance (1-2)
`vector_search`	boolean	Enable hybrid semantic search (default: false)
`hybrid_alpha`	number	Vector weight: 0.0 = pure BM25, 1.0 = pure vector
`language`	string	ISO 639-1 language code for facet labels

Search Response

{
  "found": 150,
  "took_ms": 5,
  "hits": [
    {
      "id": "prod_1",
      "score": 0.95,
      "doc": {
        "id": "prod_1",
        "title": "Wireless Headphones",
        "price": 299.99,
        "category": "electronics"
      }
    }
  ],
  "facets": {
    "category": [
      {"value": "electronics", "count": 45},
      {"value": "accessories", "count": 23}
    ]
  }
}

Suggestions API

Autocomplete and query suggestions based on indexed terms.

Query Suggestions

Get term suggestions based on a prefix:

GET /api/v1/collections/products/suggest?prefix=lap&limit=5

Response:
{
  "suggestions": [
    {"term": "laptop", "score": 1500},
    {"term": "laptops", "score": 850},
    {"term": "laptop-case", "score": 120}
  ],
  "took_ms": 2
}

Parameters

Parameter	Type	Description
`prefix`	string	The prefix to match (required)
`limit`	number	Maximum suggestions to return (default: 10)

Facet Suggestions

Get suggestions for facet values:

GET /api/v1/collections/products/suggest-facets?prefix=Elec&facet=category&limit=5

Response:
{
  "suggestions": [
    {"value": "Electronics", "count": 450},
    {"value": "Electronics/Computers", "count": 200},
    {"value": "Electronics/Phones", "count": 180}
  ]
}

Field Types

Type	Description
`string`	Text field (indexed by default)
`string[]`	Array of strings (filterable, facetable)
`int32`	32-bit integer
`int32[]`	Array of 32-bit integers (filterable)
`int64`	64-bit integer
`int64[]`	Array of 64-bit integers (filterable)
`float`	64-bit floating point
`float[]`	Array of 64-bit floats (filterable)
`bool`	Boolean
`hierarchy`	Hierarchical category with "/" separator
`attributes`	Dynamic key-value pairs for flexible faceting

Note: Numeric array filters use ANY semantics - a document matches if any element in the array satisfies the filter condition.

Field Options

Option	Default	Description
`index`	true	Enable full-text search
`facet`	false	Enable facet counting
`sort`	false	Enable sorting
`optional`	false	Allow missing values
`merge`	false	Enable merge index for Collection Merge and Aggregation queries

Collection Options

Option	Default	Description
`enable_stemming`	false	Enable word stemming
`stem_language`	"english"	Stemming language (see Supported Languages)
`enable_stop_words`	false	Filter common words
`stop_words_language`	"english"	Stop words language (see Supported Languages)
`enable_vectors`	false	Enable semantic/vector search
`vector_fields`	all text	Fields to generate embeddings from
`num_shards`	null	Number of shards for parallel processing

Filtering

field:=value       # Equals
field:!=value      # Not equals
field:>value       # Greater than
field:>=value      # Greater than or equal
field:=100 AND category:^Electronics

Typo Tolerance

OverclockDB supports typo-tolerant search using the SymSpell algorithm:

POST /api/v1/collections/products/search
{
  "q": "laptp",
  "typo_tolerance": 2
}

With typo_tolerance: 2, the query "laptp" will match documents containing "laptop".

Value	Description
0 or null	Disabled (exact matching only)
1	Allow 1 character difference
2	Allow up to 2 character differences

Stemming

Stemming reduces words to their root form for better recall:

"running", "runs", "ran" → "run"
"computers", "computing" → "comput"
"лаптопи" → "лаптоп" (Bulgarian)

POST /api/v1/collections
{
  "name": "articles",
  "fields": [{"name": "content", "type": "string"}],
  "enable_stemming": true,
  "stem_language": "english"
}

Supported Languages (19)

Language	API Value	Stemmer
Arabic	`"arabic"`	Snowball
Bulgarian	`"bulgarian"`	BulStem (128K rules)
Danish	`"danish"`	Snowball
Dutch	`"dutch"`	Snowball
English	`"english"`	Snowball (default)
Finnish	`"finnish"`	Snowball
French	`"french"`	Snowball
German	`"german"`	Snowball
Greek	`"greek"`	Snowball
Hungarian	`"hungarian"`	Snowball
Italian	`"italian"`	Snowball
Norwegian	`"norwegian"`	Snowball
Portuguese	`"portuguese"`	Snowball
Romanian	`"romanian"`	Snowball
Russian	`"russian"`	Snowball
Spanish	`"spanish"`	Snowball
Swedish	`"swedish"`	Snowball
Tamil	`"tamil"`	Snowball
Turkish	`"turkish"`	Snowball

Note: Bulgarian uses the custom BulStem algorithm with ~128,000 stemming rules, lazy-loaded for optimal performance.

Stop Words

Stop words are common words like "the", "a", "is" that are filtered out:

POST /api/v1/collections
{
  "name": "articles",
  "fields": [{"name": "content", "type": "string"}],
  "enable_stop_words": true,
  "stop_words_language": "english"
}

Supported Languages (19)

Stop words are available for all 19 supported languages. Use "none" to disable filtering.

Language	API Value	Word Count
Arabic	`"arabic"`	~120 words
Bulgarian	`"bulgarian"`	~260 words
Danish	`"danish"`	~100 words
Dutch	`"dutch"`	~100 words
English	`"english"`	~120 words
Finnish	`"finnish"`	~230 words
French	`"french"`	~160 words
German	`"german"`	~230 words
Greek	`"greek"`	~75 words
Hungarian	`"hungarian"`	~200 words
Italian	`"italian"`	~280 words
Norwegian	`"norwegian"`	~175 words
Portuguese	`"portuguese"`	~200 words
Romanian	`"romanian"`	~230 words
Russian	`"russian"`	~70 words
Spanish	`"spanish"`	~350 words
Swedish	`"swedish"`	~115 words
Tamil	`"tamil"`	~100 words
Turkish	`"turkish"`	~115 words
None	`"none"`	Disabled

Note: Stop word lists sourced from Alir3z4/stop-words (CC-BY-4.0).

Hierarchical Categories

The hierarchy field type enables tree-structured categories with automatic ancestor indexing.

Schema Definition

POST /api/v1/collections
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "category", "type": "hierarchy", "facet": true},
    {"name": "price", "type": "float", "sort": true}
  ]
}

Document Examples

// Single category
{"id": "laptop-1", "title": "MacBook Pro", "category": "Electronics/Computers/Laptops"}

// Multiple categories
{"id": "organizer-1", "title": "Desk Organizer", "category": ["Office/Supplies", "Home/Storage"]}

Searching Hierarchies

# All electronics (matches Laptops, Desktops, Phones, etc.)
{"q": "*", "filter": "category:^Electronics", "facets": ["category"]}

# Only computers
{"q": "*", "filter": "category:^Electronics/Computers"}

# Multiple hierarchies (OR)
{"q": "*", "filter": "category:^[Electronics,Clothing]"}

Drill-Down Navigation

# Get children of Electronics
{"q": "*", "facets": ["category"], "hierarchy_parent": "Electronics"}

Response includes hierarchy_facets:

{
  "hierarchy_facets": {
    "category": [
      {"path": "Electronics/Computers", "name": "Computers", "count": 60, "depth": 1, "has_children": true},
      {"path": "Electronics/Phones", "name": "Phones", "count": 40, "depth": 1, "has_children": false}
    ]
  }
}

Internationalization (i18n)

OverclockDB supports translated facet labels for multilingual product catalogs.

Set Translations

PUT /api/v1/collections/products/translations
{
  "field": "category",
  "translations": [
    {
      "value": "Electronics",
      "labels": {"en": "Electronics", "es": "Electrónicos", "de": "Elektronik"}
    },
    {
      "value": "Electronics/Computers",
      "labels": {"en": "Computers", "es": "Computadoras", "de": "Computer"}
    }
  ]
}

Search with Language

POST /api/v1/collections/products/search
{
  "q": "laptop",
  "facets": ["category", "brand"],
  "language": "es"
}

Response includes translated labels:

{
  "facets": {
    "brand": [
      {"value": "apple", "label": "Apple Inc.", "count": 50}
    ]
  },
  "hierarchy_facets": {
    "category": [
      {"path": "Electronics", "label": "Electrónicos", "count": 100}
    ]
  }
}

Language Fallback

When a translation is not found:

Requested language (e.g., "es")
English ("en")
Raw field value

Semantic Search

Hybrid semantic search combines BM25 text matching with vector similarity.

Setup

POST /api/v1/collections
{
  "name": "articles",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "content", "type": "string"}
  ],
  "enable_vectors": true,
  "vector_fields": ["title", "content"]
}

Hybrid Search

POST /api/v1/collections/articles/search
{
  "q": "machine learning applications",
  "vector_search": true,
  "hybrid_alpha": 0.5
}

hybrid_alpha	Description
0.0	Pure BM25 (keyword matching only)
0.5	Balanced hybrid (default)
1.0	Pure vector (semantic similarity only)

Response with Scores

{
  "hits": [
    {
      "id": "doc_1",
      "score": 0.85,
      "text_score": 0.72,
      "vector_score": 0.98,
      "doc": { ... }
    }
  ]
}

Sharding

Hash-based document sharding for parallel search with 2-3x speedup.

Create a Sharded Collection

POST /api/v1/collections
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "brand", "type": "string", "facet": true},
    {"name": "price", "type": "float"}
  ],
  "num_shards": 4
}

How It Works

Document Routing: Documents distributed via hash of document ID
Parallel Search: Queries run on all shards concurrently
Result Merging: K-way heap merge with global BM25 statistics
Facet Aggregation: Counts summed across all shards

Performance

Collection Size	Regular	4 Shards	Speedup
50K documents	949 µs	352 µs	2.7x faster
100K documents	1.17 ms	386 µs	3x faster

Recommendations

Collection Size	Recommended Shards
< 50K docs	1 (no sharding)
50K - 500K	4 shards
500K - 2M	8 shards
> 2M docs	8-16 shards

Aggregation Framework

A powerful multi-source data merging system with computed fields, pattern expansion, and priority-based selection. Ideal for B2B pricing, inventory aggregation, and rating computation.

Overview

The Aggregation Framework enables:

Multi-source loading - Load data from multiple collections with priority-based fallback
Pattern expansion - Dynamic collection names using context variables (e.g., prices_customer_{customer_id})
Priority merging - Select best value using configurable strategies
Computed fields - Calculate derived fields using expressions
Search integration - Filter and sort by aggregated/computed fields

Aggregation Config API

# Create an aggregation config
POST /api/v1/aggregations
{
  "name": "b2b_pricing",
  "merge_key": "product_id",
  "sources": [...],
  "priority_strategy": {...},
  "computed_fields": [...]
}

# List all configs
GET /api/v1/aggregations

# Get a specific config
GET /api/v1/aggregations/:name

# Update a config
PUT /api/v1/aggregations/:name

# Delete a config
DELETE /api/v1/aggregations/:name

Complete B2B Pricing Example

This example implements customer-specific pricing with group fallbacks and computed discounts.

Step 1: Create Collections

# Products collection (base data)
POST /api/v1/collections
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "category", "type": "hierarchy", "facet": true},
    {"name": "brand", "type": "string", "facet": true},
    {"name": "msrp", "type": "float"}
  ]
}

# Customer-specific prices (merge-enabled)
POST /api/v1/collections
{
  "name": "prices_customer_vip123",
  "fields": [
    {"name": "price", "type": "float", "merge": true, "sort": true},
    {"name": "discount_percent", "type": "float", "merge": true},
    {"name": "allow_discount", "type": "bool", "merge": true}
  ]
}

# Group prices (wholesale, retail, etc.)
POST /api/v1/collections
{
  "name": "prices_group_wholesale",
  "fields": [
    {"name": "price", "type": "float", "merge": true, "sort": true},
    {"name": "discount_percent", "type": "float", "merge": true},
    {"name": "allow_discount", "type": "bool", "merge": true}
  ]
}

# Default prices (fallback)
POST /api/v1/collections
{
  "name": "prices_default",
  "fields": [
    {"name": "price", "type": "float", "merge": true, "sort": true},
    {"name": "discount_percent", "type": "float", "merge": true},
    {"name": "allow_discount", "type": "bool", "merge": true}
  ]
}

Step 2: Add Sample Data

# Add products
POST /api/v1/collections/products/docs/batch
{
  "documents": [
    {"id": "laptop_1", "title": "Gaming Laptop Pro", "category": "Electronics/Computers", "brand": "TechBrand", "msrp": 1500.00},
    {"id": "laptop_2", "title": "Business Laptop", "category": "Electronics/Computers", "brand": "WorkPro", "msrp": 1200.00}
  ]
}

# Customer-specific prices (VIP gets 20% discount)
POST /api/v1/collections/prices_customer_vip123/docs/batch
{
  "documents": [
    {"id": "laptop_1", "price": 1200.00, "discount_percent": 20, "allow_discount": true}
  ]
}

# Wholesale group prices
POST /api/v1/collections/prices_group_wholesale/docs/batch
{
  "documents": [
    {"id": "laptop_1", "price": 1350.00, "discount_percent": 10, "allow_discount": true},
    {"id": "laptop_2", "price": 1080.00, "discount_percent": 10, "allow_discount": true}
  ]
}

# Default prices
POST /api/v1/collections/prices_default/docs/batch
{
  "documents": [
    {"id": "laptop_1", "price": 1500.00, "discount_percent": 0, "allow_discount": false},
    {"id": "laptop_2", "price": 1200.00, "discount_percent": 0, "allow_discount": false}
  ]
}

Step 3: Create Aggregation Config

POST /api/v1/aggregations
{
  "name": "b2b_pricing",
  "merge_key": "product_id",
  "sources": [
    {
      "pattern": "prices_customer_{customer_id}",
      "priority": 1,
      "exact": true
    },
    {
      "collection": "prices_default",
      "priority": 2
    }
  ],
  "priority_strategy": {
    "type": "by_priority",
    "prefer_exact": true
  },
  "computed_fields": [
    {
      "name": "final_price",
      "expression": "if(allow_discount, price * (1 - discount_percent / 100), price)"
    },
    {
      "name": "savings",
      "expression": "price - final_price"
    }
  ]
}

Step 4: Search with Pricing Context

POST /api/v1/collections/products/search
{
  "q": "laptop",
  "query_by": ["title"],
  "aggregation": {
    "config_name": "b2b_pricing",
    "context": {
      "customer_id": "vip123"
    }
  },
  "filter": "final_price:<1400",
  "sort_by": "final_price:asc",
  "facets": ["category", "brand"],
  "limit": 10
}

Response

{
  "found": 2,
  "took_ms": 8,
  "hits": [
    {
      "id": "laptop_1",
      "title": "Gaming Laptop Pro",
      "category": "Electronics/Computers",
      "brand": "TechBrand",
      "score": 0.95,
      "price": 1200.00,
      "discount_percent": 20,
      "allow_discount": true,
      "final_price": 960.00,
      "savings": 240.00
    },
    {
      "id": "laptop_2",
      "title": "Business Laptop",
      "category": "Electronics/Computers",
      "brand": "WorkPro",
      "score": 0.85,
      "price": 1080.00,
      "discount_percent": 10,
      "allow_discount": true,
      "final_price": 972.00,
      "savings": 108.00
    }
  ],
  "facets": {
    "category": [{"value": "Electronics/Computers", "count": 2}],
    "brand": [{"value": "TechBrand", "count": 1}, {"value": "WorkPro", "count": 1}]
  }
}

Config Reference

{
  "name": "config_name",
  "merge_key": "product_id",
  "sources": [
    {
      "collection": "static_collection_name",
      "pattern": "dynamic_collection_{variable}",
      "priority": 1,
      "exact": false,
      "shard_by": "shard_field",
      "fields": { "source_field": "target_field" }
    }
  ],
  "priority_strategy": {
    "type": "by_priority",
    "prefer_exact": true
  },
  "computed_fields": [
    {
      "name": "computed_field_name",
      "expression": "price * (1 - discount / 100)"
    }
  ]
}

Source Options

Field	Type	Description
`collection`	string	Static collection name (mutually exclusive with pattern)
`pattern`	string	Dynamic collection pattern with `{variable}` placeholders
`priority`	number	Priority level (lower = higher priority, default: 0)
`exact`	boolean	Mark as "exact" match for prefer_exact strategies
`shard_by`	string	Field to shard by for shard-keyed collections
`fields`	object	Field name mappings: {"source": "target"}

Priority Strategies

Strategy	JSON	Description
By Priority	`{"type": "by_priority", "prefer_exact": true}`	Lower priority number wins. If prefer_exact, exact sources preferred.
Min Value	`{"type": "min_value", "field": "price"}`	Select record with minimum value of specified field
Max Value	`{"type": "max_value", "field": "rating"}`	Select record with maximum value of specified field
First Match	`{"type": "first_match"}`	Take first match by priority order
All	`{"type": "all"}`	Return all matches (no merging)

Expression Language

Computed fields use a simple expression language for calculations.

Arithmetic Operators

price + tax                    # Addition
price - discount               # Subtraction
price * quantity               # Multiplication
total / count                  # Division

Comparison Operators

price > 100                    # Greater than
price >= 100                   # Greater than or equal
price < 100                    # Less than
price <= 100                   # Less than or equal
status == "active"             # Equal
status != "inactive"           # Not equal

Logical Operators

in_stock && price < 100        # AND
is_sale || is_clearance        # OR
!is_discontinued               # NOT

Conditional Expression

if(condition, then_value, else_value)

# Examples:
if(allow_discount, price * 0.9, price)
if(quantity > 10, price * 0.95, price)
if(is_member && total > 100, total * 0.85, total)

Built-in Functions

min(a, b)                      # Minimum of two values
max(a, b)                      # Maximum of two values
min(price, sale_price, promo)  # Minimum of multiple values
max(rating1, rating2, rating3) # Maximum of multiple values

Pattern Expansion

Dynamic collection names are resolved at query time using context variables.

Simple Variable Expansion

Pattern: "prices_customer_{customer_id}"
Context: {"customer_id": "vip123"}
Result:  "prices_customer_vip123"

Querying Multiple Shard Values

For shard-keyed collections, use the shard_values parameter directly:

POST /api/v1/collections/prices/search
{
  "q": "*",
  "shard_values": ["wholesale", "retail", "vip"]
}

Use Case Examples

Inventory Aggregation

{
  "name": "inventory_aggregation",
  "merge_key": "sku",
  "sources": [
    { "pattern": "inventory_warehouse_{warehouse_id}", "priority": 1 },
    { "collection": "inventory_central", "priority": 2 }
  ],
  "priority_strategy": { "type": "max_value", "field": "quantity" },
  "computed_fields": [
    { "name": "in_stock", "expression": "quantity > 0" },
    { "name": "low_stock", "expression": "quantity > 0 && quantity < 10" }
  ]
}

Rating Aggregation

{
  "name": "product_ratings",
  "merge_key": "product_id",
  "sources": [
    { "collection": "internal_reviews", "priority": 1, "fields": {"rating": "internal_rating", "count": "internal_count"} },
    { "collection": "external_reviews", "priority": 2, "fields": {"rating": "external_rating", "count": "external_count"} }
  ],
  "priority_strategy": { "type": "all" },
  "computed_fields": [
    { "name": "avg_rating", "expression": "(internal_rating + external_rating) / 2" },
    { "name": "total_reviews", "expression": "internal_count + external_count" }
  ]
}

Performance

Metric	Value
Products tested	40K+
Source collections per query	10+
Query time (p99)	<50ms for top 100 results
Pattern expansion	~500ns per variable
Expression evaluation	~200ns per expression

Collection Merge

Join separate collections at query time (similar to SQL JOIN). Useful for context-specific pricing, customer overrides, or any scenario requiring data from multiple collections merged into search results.

Note: For complex multi-source scenarios with computed fields, use the Aggregation Framework instead.

Create Merge-Enabled Collection

Set merge: true on fields to enable merge indexing:

POST /api/v1/collections
{
  "name": "prices_store_123",
  "fields": [
    {"name": "price", "type": "float", "merge": true, "sort": true},
    {"name": "discount", "type": "float", "merge": true}
  ]
}

# Add prices (id = product ID from base collection)
POST /api/v1/collections/prices_store_123/docs
{"id": "prod_456", "price": 99.99, "discount": 10}
{"id": "prod_789", "price": 149.99, "discount": 0}

Search with Merge

POST /api/v1/collections/products/search
{
  "q": "laptop",
  "merge": {
    "collections": ["prices_customer_vip", "prices_store_123"],
    "priority_collection": "prices_customer_vip",
    "comparison_field": "price",
    "strategy": "min",
    "return_fields": ["price", "discount"]
  },
  "sort_by": "price:asc",
  "limit": 10
}

Merge Configuration

Parameter	Type	Description
`collections`	string[]	Collections to merge into results (required)
`priority_collection`	string	If this collection has a value, use it directly
`comparison_field`	string	Field for min/max strategy (default: first merge field)
`strategy`	string	How to combine values: "min" or "max" (default: "min")
`return_fields`	string[]	Which fields to include in response

Resolution Logic

Query all merge collections in parallel for matching document IDs
If priority_collection is set AND has a value → use it directly
Otherwise → apply strategy (min/max) across all collections
Documents without ANY matching merge entry → excluded from results

Response Format

Collection merge uses a flat response format with all fields at the root level:

{
  "found": 150,
  "took_ms": 3,
  "hits": [
    {
      "id": "prod_456",
      "title": "Gaming Laptop",
      "category": "Electronics",
      "score": 0.95,
      "price": 89.99,
      "discount": 10
    }
  ]
}

Note: Reserved fields (id, score, text_score, vector_score) cannot be overwritten by merge fields.

Performance

Indexed sorting: O(offset + limit) for single-collection sorted merge queries
Parallel lookups: Multiple merge collections queried concurrently
Benchmark: 170x-11,700x faster than manual sort for sorted iteration

Attribute Facets

Dynamic key-value faceting for flexible product attributes. Unlike regular facets, attribute facets support arbitrary keys that vary per document.

Schema Definition

POST /api/v1/collections
{
  "name": "products",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "specs", "type": "attributes", "facet": true}
  ]
}

Document Examples

// Laptop with CPU, RAM, Storage specs
{
  "id": "laptop-1",
  "title": "MacBook Pro 16",
  "specs": {
    "cpu": "M3 Pro",
    "ram": "18GB",
    "storage": "512GB SSD"
  }
}

// Phone with different specs
{
  "id": "phone-1",
  "title": "iPhone 15 Pro",
  "specs": {
    "cpu": "A17 Pro",
    "storage": "256GB",
    "display": "6.1 inch"
  }
}

Search with Attribute Facets

POST /api/v1/collections/products/search
{
  "q": "*",
  "facets": ["specs"],
  "max_attribute_types": 10,
  "max_attribute_values": 5
}

Response Format

{
  "found": 100,
  "hits": [...],
  "attribute_facets": {
    "specs": {
      "types": ["cpu", "ram", "storage", "display"],
      "values": {
        "cpu": [
          {"value": "M3 Pro", "count": 15},
          {"value": "A17 Pro", "count": 12},
          {"value": "i7-13700", "count": 8}
        ],
        "ram": [
          {"value": "18GB", "count": 20},
          {"value": "16GB", "count": 18}
        ],
        "storage": [
          {"value": "512GB SSD", "count": 25},
          {"value": "256GB", "count": 22}
        ]
      }
    }
  }
}

Filtering on Attributes

# Filter by specific attribute value
"filter": "specs.cpu:=M3 Pro"

# Combine multiple attribute filters
"filter": "specs.cpu:=M3 Pro AND specs.ram:=18GB"

Shard-Keyed Collections

Route documents to shards based on a field value (e.g., customer_id) instead of document ID. This enables single-shard queries when the shard key is known.

Create Shard-Keyed Collection

POST /api/v1/collections
{
  "name": "prices",
  "fields": [
    {"name": "customer_id", "type": "string"},
    {"name": "product_id", "type": "string"},
    {"name": "price", "type": "float", "merge": true}
  ],
  "shard_config": {
    "shard_key": "customer_id",
    "num_shards": 8
  }
}

How It Works

Document routing: Documents are distributed to shards based on hash of the shard_key field value
Single-shard queries: When shard key is provided, query hits only one shard (O(1) routing)
Scatter-gather: Without shard key, query searches all shards in parallel

Query with Shard Key

# Fast path - single shard lookup
POST /api/v1/collections/prices/search
{
  "q": "*",
  "filter": "customer_id:=vip123",
  "limit": 100
}
# Routes directly to shard containing customer "vip123"

Use Cases

Per-customer pricing: Shard by customer_id for fast customer-specific price lookups
Multi-tenant data: Shard by tenant_id for data isolation
Geographic data: Shard by region for localized queries

vs. Regular Sharding

Feature	Regular Sharding	Shard-Keyed
Routing basis	Document ID hash	Field value hash
Single-shard queries	By document ID only	By shard key value
Best for	Even distribution	Key-based access patterns

Design Considerations

When managing data for many contexts (e.g., thousands of customers), you have two architectural choices:

Option 1: Separate Collections per Context

prices_customer_001
prices_customer_002
prices_customer_003
... (thousands of collections)

Pros: Complete physical isolation, independent retention policies, different schemas per customer
Cons: Management overhead (thousands of collections), memory overhead per collection (indexes, metadata)

Option 2: Shard-Keyed Single Collection

prices (shard_key: customer_id, num_shards: 8)
└── Contains all customer data, partitioned by hash

Pros: Single collection to manage, efficient memory usage, cross-customer analytics possible
Cons: Data mixed at shard level (but filtered at query time)

Why Data Mixing is Acceptable

When you query with filter: "customer_id:=vip123":

System calculates hash("vip123") % 8 → routes to single shard
The customer_id index on that shard instantly narrows results to ~100 docs (not 100,000)
Other customers' data on the same shard is never returned - filtered out by the index

Result: Query-time isolation with the efficiency of a single collection.

When to Use Each Approach

Use Case	Recommendation
Same schema for all contexts	Shard-keyed single collection
Compliance requires physical isolation	Separate collections
Different retention policies per context	Separate collections
Need cross-context analytics	Shard-keyed single collection
Thousands of contexts	Shard-keyed single collection
Contexts have different schemas	Separate collections