Back to Home
Whitepaper
VERSION 2.0.1

ATON V2 Documentation

Complete API reference for production-grade data serialization

Installation

Python

pip install aton-format

JavaScript

npm install aton-format
# or
yarn add aton-format

From Source

git clone https://github.com/dagoSte/aton-format.git
cd aton-format
pip install -e .

Quick Start

Python Example

from aton_format import ATONEncoder, ATONDecoder, CompressionMode

# Initialize encoder with compression
encoder = ATONEncoder(
    compression=CompressionMode.BALANCED,
    optimize=True
)

# Your data
data = {
    "employees": [
        {"id": 1, "name": "Alice", "salary": 95000, "active": True},
        {"id": 2, "name": "Bob", "salary": 92000, "active": True},
        {"id": 3, "name": "Carol", "salary": 110000, "active": False}
    ]
}

# Encode to ATON
aton_text = encoder.encode(data)
print(aton_text)

# Decode back to original
decoder = ATONDecoder()
original = decoder.decode(aton_text)

assert data == original  # Perfect round-trip!

JavaScript Example

const ATON = require('aton-format');

// Initialize encoder
const encoder = new ATON.Encoder({
    compression: ATON.CompressionMode.BALANCED,
    optimize: true
});

// Your data
const data = {
    employees: [
        {id: 1, name: "Alice", salary: 95000, active: true},
        {id: 2, name: "Bob", salary: 92000, active: true},
        {id: 3, name: "Carol", salary: 110000, active: false}
    ]
};

// Encode to ATON
const atonText = encoder.encode(data);
console.log(atonText);

// Decode back
const decoder = new ATON.Decoder();
const original = decoder.decode(atonText);

console.assert(JSON.stringify(data) === JSON.stringify(original));

What's New in V2

[*] Compression Modes
Four intelligent modes: Fast, Balanced, Ultra, and Adaptive
[Q] Query Language
SQL-like queries with full AST parser and evaluation
[S] Streaming Support
Memory-efficient processing of large datasets
[!] Error Handling
Comprehensive exception hierarchy and validation

ATONEncoder

The main encoder class for converting data to ATON format.

ATONEncoder(optimize=True, compression='balanced', queryable=False, validate=True)
optimize bool - Enable optimization (schemas, defaults)
compression str | CompressionMode - Compression strategy: 'fast', 'balanced', 'ultra', or 'adaptive'
queryable bool - Generate query metadata
validate bool - Validate input data

Methods

encode(data, compress=True) -> str

Encode data to ATON format

data dict - Data to encode
compress bool - Apply compression
encode_with_query(data, query) -> str

Encode data after applying query filter

data dict - Data to encode
query str - Query string (SQL-like)

Example

encoder = ATONEncoder(
    compression=CompressionMode.ADAPTIVE,
    optimize=True,
    queryable=True
)

result = encoder.encode_with_query(
    data,
    "employees WHERE salary > 100000 ORDER BY salary DESC"
)

ATONDecoder

Decoder class for converting ATON format back to original data.

ATONDecoder(validate=True)
validate bool - Validate decoded data structure

Methods

decode(aton_text) -> dict

Decode ATON format to original data

aton_text str - ATON formatted string
decoder = ATONDecoder(validate=True)
data = decoder.decode(aton_text)

# Perfect round-trip guaranteed
assert original_data == data

ATONStreamEncoder

Memory-efficient encoder for processing large datasets in chunks.

ATONStreamEncoder(chunk_size=100, compression='balanced')
chunk_size int - Number of records per chunk
compression str | CompressionMode - Compression mode to use

Methods

stream_encode(data, table_name=None) -> Iterator[dict]

Stream encode data in chunks with schema caching

data dict - Data with single table
table_name str - Optional table name

Example

stream_encoder = ATONStreamEncoder(
    chunk_size=1000,
    compression=CompressionMode.ULTRA
)

# Process 100K records in chunks
data = {"products": large_product_list}

for chunk_info in stream_encoder.stream_encode(data):
    print(f"Chunk {chunk_info['chunk_id']}/{chunk_info['total_chunks']}")
    print(f"Records: {chunk_info['metadata']['records_in_chunk']}")
    
    # Process chunk
    process_chunk(chunk_info['data'])
    
    # Memory stays constant!

ATONQueryEngine

SQL-like query engine with full AST parser and evaluation.

query(data, query_string) -> list

Execute query on data

data dict - Data dictionary
query_string str - SQL-like query

Example

from aton_format import ATONQueryEngine

engine = ATONQueryEngine()

# Complex query
results = engine.query(
    data,
    """
    products WHERE 
        (price BETWEEN 100 AND 500) 
        AND category IN ('Electronics', 'Computers')
        AND name LIKE '%Premium%'
    ORDER BY rating DESC
    LIMIT 20
    """
)

# Results are filtered, sorted, and paginated
for product in results:
    print(product['name'], product['price'])

Compression Modes

ATON V2 provides four compression strategies, each optimized for different use cases.

Mode Speed Compression Use Case
FAST ***** ** Real-time applications, low latency requirements
BALANCED **** *** General purpose, recommended default
ULTRA *** ***** Batch processing, storage optimization
ADAPTIVE **** **** Mixed workloads, automatic optimization

Fast Mode

Dictionary compression only. Optimized for speed.

encoder = ATONEncoder(compression=CompressionMode.FAST)
# Fastest encoding, moderate compression

Balanced Mode

Dictionary + selective algorithms. Best for most use cases.

encoder = ATONEncoder(compression=CompressionMode.BALANCED)
# Optimal balance, recommended default

Ultra Mode

All compression algorithms. Maximum token reduction.

encoder = ATONEncoder(compression=CompressionMode.ULTRA)
# Maximum compression for storage/batch

Adaptive Mode

AI-driven mode selection based on data patterns.

encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)
# Automatically chooses best strategy

Query Syntax

ATON V2 supports a SQL-like query language for filtering and transforming data before encoding.

Basic Syntax

[SELECT fields] table_name [WHERE conditions] [ORDER BY field direction] [LIMIT n] [OFFSET m]

WHERE Clauses

# Simple condition
employees WHERE salary > 100000

# Multiple conditions
employees WHERE salary > 100000 AND active = true

# Complex logic
employees WHERE (role = 'Engineer' OR role = 'Manager') 
         AND salary > 80000

Operators

Comparison Operators

Operator Description Example
= Equal to status = 'active'
!= Not equal to department != 'HR'
< Less than age < 30
> Greater than salary > 50000
<= Less than or equal quantity <= 100
>= Greater than or equal rating >= 4.0

Special Operators

Operator Description Example
IN Value in list status IN ('active', 'pending')
NOT IN Value not in list role NOT IN ('admin', 'root')
LIKE Pattern matching name LIKE '%Smith%'
BETWEEN Range check price BETWEEN 100 AND 500

Logical Operators

Operator Description Example
AND Both conditions true a = 1 AND b = 2
OR Either condition true a = 1 OR b = 2
NOT Negates condition NOT (status = 'deleted')

Query Examples

Simple Filtering

employees WHERE active = true

Multiple Conditions

products WHERE price > 100 AND stock > 0 AND featured = true

Pattern Matching

customers WHERE email LIKE '%@gmail.com' AND name LIKE 'A%'

Range Queries

orders WHERE total BETWEEN 1000 AND 5000 AND created_date > '2024-01-01'

Complex Logic

employees WHERE 
    (department = 'Engineering' AND salary > 90000)
    OR (department = 'Sales' AND commission > 50000)
    OR (role = 'Executive')
ORDER BY salary DESC
LIMIT 50

Field Selection

SELECT name, email, department 
FROM employees 
WHERE active = true
ORDER BY name ASC

Pagination

products WHERE category = 'Electronics'
ORDER BY rating DESC
LIMIT 20 OFFSET 40  # Page 3 (40-60)

Streaming

For large datasets, use streaming to process data in chunks without loading everything into memory.

Basic Streaming

from aton_format import ATONStreamEncoder, CompressionMode

# Initialize stream encoder
stream_encoder = ATONStreamEncoder(
    chunk_size=1000,
    compression=CompressionMode.ULTRA
)

# Large dataset
data = {
    "transactions": [
        # ... 1 million records
    ]
}

# Stream encode
for chunk in stream_encoder.stream_encode(data):
    # Chunk structure
    print(f"Chunk {chunk['chunk_id']} of {chunk['total_chunks']}")
    print(f"First chunk: {chunk['is_first']}")
    print(f"Last chunk: {chunk['is_last']}")
    print(f"Records: {chunk['metadata']['records_in_chunk']}")
    
    # Send chunk to LLM or store
    send_to_llm(chunk['data'])

Schema Caching

The first chunk includes schema and defaults. Subsequent chunks only contain data rows.

[TIP] Performance Tip

Schema is inferred once and cached. Subsequent chunks use minimal formatting, maximizing compression.

Error Handling

ATON V2 provides comprehensive error handling with a custom exception hierarchy.

Exception Hierarchy

ATONError (base)
|-- ATONEncodingError
|-- ATONDecodingError
|-- ATONQueryError
`-- ATONCompressionError

Error Handling Example

from aton_format import (
    ATONEncoder, 
    ATONEncodingError, 
    ATONDecodingError,
    ATONQueryError
)

try:
    encoder = ATONEncoder(validate=True)
    result = encoder.encode(data)
    
except ATONEncodingError as e:
    print(f"Encoding failed: {e}")
    # Handle encoding error
    
except ATONQueryError as e:
    print(f"Query failed: {e}")
    # Handle query error
    
except Exception as e:
    print(f"Unexpected error: {e}")
    # Handle unexpected errors

Validation

# Enable validation (default)
encoder = ATONEncoder(validate=True)

# Validation checks:
# - Data structure
# - Type consistency
# - Required fields
# - Value ranges

Performance

Token Reduction Benchmarks

Dataset JSON Tokens ATON Tokens Reduction
Employee Records (1K) 12,450 5,280 57.6%
Product Catalog (10K) 145,200 64,800 55.4%
Transaction Log (100K) 1,856,000 815,000 56.1%
Agent States (Real-time) 42,000 18,500 56.0%

Compression Speed

Mode Records/sec Latency
FAST ~50,000 <1ms
BALANCED ~35,000 <2ms
ULTRA ~20,000 <5ms
ADAPTIVE ~30,000 <3ms
[INFO] Benchmark Environment

Tests performed on Apple M1 Pro, 16GB RAM, Python 3.11. Results may vary based on hardware and data characteristics.

Best Practices

1. Choose the Right Compression Mode

# Real-time applications
encoder = ATONEncoder(compression=CompressionMode.FAST)

# General purpose (recommended)
encoder = ATONEncoder(compression=CompressionMode.BALANCED)

# Batch processing / storage
encoder = ATONEncoder(compression=CompressionMode.ULTRA)

# Mixed workloads
encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)

2. Use Streaming for Large Datasets

# DON'T: Load everything in memory
data = load_million_records()
encoded = encoder.encode(data)  # Memory spike!

# DO: Use streaming
stream_encoder = ATONStreamEncoder(chunk_size=1000)
for chunk in stream_encoder.stream_encode(data):
    process_chunk(chunk)  # Constant memory!

3. Leverage Query Language

# DON'T: Send all data then filter
all_data = encoder.encode(large_dataset)
# LLM has to process everything...

# DO: Filter before encoding
filtered = encoder.encode_with_query(
    large_dataset,
    "employees WHERE salary > 100000 LIMIT 100"
)
# Only 100 relevant records sent!

4. Enable Optimization

# Always enable optimization for repeated structures
encoder = ATONEncoder(
    optimize=True,  # Enable schemas and defaults
    compression=CompressionMode.BALANCED
)

5. Handle Errors Gracefully

from aton_format import ATONError

try:
    result = encoder.encode(data)
except ATONError as e:
    # ATON-specific errors
    logger.error(f"ATON error: {e}")
    fallback_to_json(data)
except Exception as e:
    # Unexpected errors
    logger.critical(f"Critical error: {e}")
    raise

6. Validate in Production

# Always validate in production
encoder = ATONEncoder(validate=True)
decoder = ATONDecoder(validate=True)

# Round-trip test
encoded = encoder.encode(data)
decoded = decoder.decode(encoded)
assert data == decoded

7. Benchmark Your Data

import time

# Test different modes
modes = [CompressionMode.FAST, CompressionMode.BALANCED, CompressionMode.ULTRA]

for mode in modes:
    encoder = ATONEncoder(compression=mode)
    
    start = time.time()
    result = encoder.encode(data)
    duration = time.time() - start
    
    print(f"{mode.value}: {duration:.3f}s, {len(result)} chars")
[!] Important

Always test ATON with your specific data patterns. Token reduction and performance vary based on data characteristics.

Additional Resources

For more information, check out:

View on GitHub