Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Token Budgeting

When you have a fixed context window (e.g., 128K tokens), every token matters. Laconic helps you fit more documents into the same budget without lossy truncation.

The Math

Say you’re building a prompt with retrieved context:

  • System prompt: 2,000 tokens
  • User query: 500 tokens
  • Available for context: 125,500 tokens
  • Retrieved docs: 150,000 tokens — doesn’t fit

Option A: Truncate. Lose information.

Option B: Compress with Laconic. If your docs are structure-heavy (tables, HTML, badges), you recover 15–50% of that space.

Budget-Aware Pipeline

#!/bin/bash
BUDGET=125000
USED=0

for doc in retrieved_docs/*.md; do
  # Get token count of compressed version
  stats=$(laconic compress -j "$doc" 2>/dev/null)
  tokens=$(echo "$stats" | jq '.compressed_tokens')
  
  NEXT=$((USED + tokens))
  if [ "$NEXT" -gt "$BUDGET" ]; then
    echo "Budget full at $USED tokens. Skipping remaining docs." >&2
    break
  fi
  
  # Output compressed text
  echo "$stats" | jq -r '.text'
  echo "---"
  USED=$NEXT
done

Python Example

import subprocess
import json

def compress_and_budget(docs: list[str], budget: int) -> str:
    context_parts = []
    used = 0

    for doc in docs:
        result = subprocess.run(
            ["laconic", "compress", "-j", "-"],
            input=doc, capture_output=True, text=True,
        )
        data = json.loads(result.stdout)
        tokens = data["compressed_tokens"]

        if used + tokens > budget:
            break

        context_parts.append(data["text"])
        used += tokens

    return "\n---\n".join(context_parts)

Fast Mode for Large Batches

If you’re processing hundreds of docs and just need the compressed text (not token counts), use fast mode to skip the BPE tokenizer entirely:

# Compress 500 docs in under a second
for doc in corpus/*.md; do
  laconic compress -f "$doc" > "compressed/$(basename "$doc")"
done

You can then count tokens separately on just the winners, or use your LLM provider’s tokenizer.

Stacking with Other Optimizations

Laconic compresses the structure. You can stack it with other techniques:

TechniqueWhat it removesTypical savings
LaconicDecorative markdown structure15–50% on structured docs
Prompt cachingRepeated prefix tokensUp to 90% cost reduction
Batch APINothing — just cheaper pricing50% cost reduction

These are multiplicative. Laconic + prompt caching + batch API can reduce effective cost by 95%+ on structure-heavy workloads.