Token Budgeting
When you have a fixed context window (e.g., 128K tokens), every token matters. Laconic helps you fit more documents into the same budget without lossy truncation.
The Math
Say you’re building a prompt with retrieved context:
- System prompt: 2,000 tokens
- User query: 500 tokens
- Available for context: 125,500 tokens
- Retrieved docs: 150,000 tokens — doesn’t fit
Option A: Truncate. Lose information.
Option B: Compress with Laconic. If your docs are structure-heavy (tables, HTML, badges), you recover 15–50% of that space.
Budget-Aware Pipeline
#!/bin/bash
BUDGET=125000
USED=0
for doc in retrieved_docs/*.md; do
# Get token count of compressed version
stats=$(laconic compress -j "$doc" 2>/dev/null)
tokens=$(echo "$stats" | jq '.compressed_tokens')
NEXT=$((USED + tokens))
if [ "$NEXT" -gt "$BUDGET" ]; then
echo "Budget full at $USED tokens. Skipping remaining docs." >&2
break
fi
# Output compressed text
echo "$stats" | jq -r '.text'
echo "---"
USED=$NEXT
done
Python Example
import subprocess
import json
def compress_and_budget(docs: list[str], budget: int) -> str:
context_parts = []
used = 0
for doc in docs:
result = subprocess.run(
["laconic", "compress", "-j", "-"],
input=doc, capture_output=True, text=True,
)
data = json.loads(result.stdout)
tokens = data["compressed_tokens"]
if used + tokens > budget:
break
context_parts.append(data["text"])
used += tokens
return "\n---\n".join(context_parts)
Fast Mode for Large Batches
If you’re processing hundreds of docs and just need the compressed text (not token counts), use fast mode to skip the BPE tokenizer entirely:
# Compress 500 docs in under a second
for doc in corpus/*.md; do
laconic compress -f "$doc" > "compressed/$(basename "$doc")"
done
You can then count tokens separately on just the winners, or use your LLM provider’s tokenizer.
Stacking with Other Optimizations
Laconic compresses the structure. You can stack it with other techniques:
| Technique | What it removes | Typical savings |
|---|---|---|
| Laconic | Decorative markdown structure | 15–50% on structured docs |
| Prompt caching | Repeated prefix tokens | Up to 90% cost reduction |
| Batch API | Nothing — just cheaper pricing | 50% cost reduction |
These are multiplicative. Laconic + prompt caching + batch API can reduce effective cost by 95%+ on structure-heavy workloads.