The Mathematics of Tokens

Understanding why TOON saves money across LLM context windows.

What is a Token?

Contrary to popular belief, LLMs do not "read" words. They process tokens—numerical representations of common character chunks. Modern models like GPT-4 use Byte Pair Encoding (BPE).

In JSON, structural characters like {, ", and : are often their own tokens. When these are repeated thousands of times in a data array, they occupy "dead space" that could have been used for actual information.

The Efficiency Formula

We calculate the Efficiency Ratio (ER) based on the redundancy of keys and structural punctuation:

ER = (T_data + T_keys) / (T_data + (N × T_keys) + T_syntax)

Where N is the number of items. In TOON, the keys are only counted once, whereas in JSON, keys are multiplied by N.

Real-World Benchmark

Consider a typical API response with 50 user objects. Each object has 5 fields.

JSON Tokens 4,250

TOON Tokens 1,840

Net Savings 56.7%

Visual comparison: Context window occupied by TOON (Green) vs JSON (Dull Red)

Impact on RAG Performance

In Retrieval-Augmented Generation, your context window is your most precious resource. By using TOON, you aren't just saving money—you are increasing the Intelligence Density of your prompt.

Higher Recall: Fit more document chunks in the same window.
Lower Latency: Models process fewer tokens, resulting in faster TTFT (Time To First Token).
Better Coherence: The model can "see" more related data at once without hitting context limits.

Maximize Your Context

Stop paying for redundant brackets and quotes.

Convert JSON Now