The Mathematics of Tokens
Understanding why TOON saves money across LLM context windows.
What is a Token?
Contrary to popular belief, LLMs do not "read" words. They process tokens—numerical representations of common character chunks. Modern models like GPT-4 use Byte Pair Encoding (BPE).
In JSON, structural characters like {, ", and : are often their own tokens.
When these are repeated thousands of times in a data array, they occupy "dead space" that could have
been used for actual information.
The Efficiency Formula
We calculate the Efficiency Ratio (ER) based on the redundancy of keys and structural punctuation:
Where N is the number of items. In TOON, the keys are only counted once, whereas in JSON, keys are multiplied by N.
Real-World Benchmark
Consider a typical API response with 50 user objects. Each object has 5 fields.
Visual comparison: Context window occupied by TOON (Green) vs JSON (Dull Red)
Impact on RAG Performance
In Retrieval-Augmented Generation, your context window is your most precious resource. By using TOON, you aren't just saving money—you are increasing the Intelligence Density of your prompt.
- Higher Recall: Fit more document chunks in the same window.
- Lower Latency: Models process fewer tokens, resulting in faster TTFT (Time To First Token).
- Better Coherence: The model can "see" more related data at once without hitting context limits.