Understanding why TOON saves money across LLM context windows.
Contrary to popular belief, LLMs do not "read" words. They process tokens—numerical representations of common character chunks. Modern models like GPT-4 use Byte Pair Encoding (BPE).
In JSON, structural characters like {, ", and : are often their own tokens.
When these are repeated thousands of times in a data array, they occupy "dead space" that could have
been used for actual information.
We calculate the Efficiency Ratio (ER) based on the redundancy of keys and structural punctuation:
Where N is the number of items. In TOON, the keys are only counted once, whereas in JSON, keys are multiplied by N.
Consider a typical API response with 50 user objects. Each object has 5 fields.
Visual comparison: Context window occupied by TOON (Green) vs JSON (Dull Red)
In Retrieval-Augmented Generation, your context window is your most precious resource. By using TOON, you aren't just saving money—you are increasing the Intelligence Density of your prompt.