Installation
Python
pip install aton-format
JavaScript
npm install aton-format # or yarn add aton-format
From Source
git clone https://github.com/dagoSte/aton-format.git cd aton-format pip install -e .
Quick Start
Python Example
from aton_format import ATONEncoder, ATONDecoder, CompressionMode
# Initialize encoder with compression
encoder = ATONEncoder(
compression=CompressionMode.BALANCED,
optimize=True
)
# Your data
data = {
"employees": [
{"id": 1, "name": "Alice", "salary": 95000, "active": True},
{"id": 2, "name": "Bob", "salary": 92000, "active": True},
{"id": 3, "name": "Carol", "salary": 110000, "active": False}
]
}
# Encode to ATON
aton_text = encoder.encode(data)
print(aton_text)
# Decode back to original
decoder = ATONDecoder()
original = decoder.decode(aton_text)
assert data == original # Perfect round-trip!
JavaScript Example
const ATON = require('aton-format');
// Initialize encoder
const encoder = new ATON.Encoder({
compression: ATON.CompressionMode.BALANCED,
optimize: true
});
// Your data
const data = {
employees: [
{id: 1, name: "Alice", salary: 95000, active: true},
{id: 2, name: "Bob", salary: 92000, active: true},
{id: 3, name: "Carol", salary: 110000, active: false}
]
};
// Encode to ATON
const atonText = encoder.encode(data);
console.log(atonText);
// Decode back
const decoder = new ATON.Decoder();
const original = decoder.decode(atonText);
console.assert(JSON.stringify(data) === JSON.stringify(original));
What's New in V2
ATONEncoder
The main encoder class for converting data to ATON format.
Methods
Encode data to ATON format
Encode data after applying query filter
Example
encoder = ATONEncoder(
compression=CompressionMode.ADAPTIVE,
optimize=True,
queryable=True
)
result = encoder.encode_with_query(
data,
"employees WHERE salary > 100000 ORDER BY salary DESC"
)
ATONDecoder
Decoder class for converting ATON format back to original data.
Methods
Decode ATON format to original data
decoder = ATONDecoder(validate=True) data = decoder.decode(aton_text) # Perfect round-trip guaranteed assert original_data == data
ATONStreamEncoder
Memory-efficient encoder for processing large datasets in chunks.
Methods
Stream encode data in chunks with schema caching
Example
stream_encoder = ATONStreamEncoder(
chunk_size=1000,
compression=CompressionMode.ULTRA
)
# Process 100K records in chunks
data = {"products": large_product_list}
for chunk_info in stream_encoder.stream_encode(data):
print(f"Chunk {chunk_info['chunk_id']}/{chunk_info['total_chunks']}")
print(f"Records: {chunk_info['metadata']['records_in_chunk']}")
# Process chunk
process_chunk(chunk_info['data'])
# Memory stays constant!
ATONQueryEngine
SQL-like query engine with full AST parser and evaluation.
Execute query on data
Example
from aton_format import ATONQueryEngine
engine = ATONQueryEngine()
# Complex query
results = engine.query(
data,
"""
products WHERE
(price BETWEEN 100 AND 500)
AND category IN ('Electronics', 'Computers')
AND name LIKE '%Premium%'
ORDER BY rating DESC
LIMIT 20
"""
)
# Results are filtered, sorted, and paginated
for product in results:
print(product['name'], product['price'])
Compression Modes
ATON V2 provides four compression strategies, each optimized for different use cases.
| Mode | Speed | Compression | Use Case |
|---|---|---|---|
| FAST | ***** | ** | Real-time applications, low latency requirements |
| BALANCED | **** | *** | General purpose, recommended default |
| ULTRA | *** | ***** | Batch processing, storage optimization |
| ADAPTIVE | **** | **** | Mixed workloads, automatic optimization |
Fast Mode
Dictionary compression only. Optimized for speed.
encoder = ATONEncoder(compression=CompressionMode.FAST) # Fastest encoding, moderate compression
Balanced Mode
Dictionary + selective algorithms. Best for most use cases.
encoder = ATONEncoder(compression=CompressionMode.BALANCED) # Optimal balance, recommended default
Ultra Mode
All compression algorithms. Maximum token reduction.
encoder = ATONEncoder(compression=CompressionMode.ULTRA) # Maximum compression for storage/batch
Adaptive Mode
AI-driven mode selection based on data patterns.
encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE) # Automatically chooses best strategy
Query Syntax
ATON V2 supports a SQL-like query language for filtering and transforming data before encoding.
Basic Syntax
[SELECT fields] table_name [WHERE conditions] [ORDER BY field direction] [LIMIT n] [OFFSET m]
WHERE Clauses
# Simple condition
employees WHERE salary > 100000
# Multiple conditions
employees WHERE salary > 100000 AND active = true
# Complex logic
employees WHERE (role = 'Engineer' OR role = 'Manager')
AND salary > 80000
Operators
Comparison Operators
| Operator | Description | Example |
|---|---|---|
| = | Equal to | status = 'active' |
| != | Not equal to | department != 'HR' |
| < | Less than | age < 30 |
| > | Greater than | salary > 50000 |
| <= | Less than or equal | quantity <= 100 |
| >= | Greater than or equal | rating >= 4.0 |
Special Operators
| Operator | Description | Example |
|---|---|---|
| IN | Value in list | status IN ('active', 'pending') |
| NOT IN | Value not in list | role NOT IN ('admin', 'root') |
| LIKE | Pattern matching | name LIKE '%Smith%' |
| BETWEEN | Range check | price BETWEEN 100 AND 500 |
Logical Operators
| Operator | Description | Example |
|---|---|---|
| AND | Both conditions true | a = 1 AND b = 2 |
| OR | Either condition true | a = 1 OR b = 2 |
| NOT | Negates condition | NOT (status = 'deleted') |
Query Examples
Simple Filtering
employees WHERE active = true
Multiple Conditions
products WHERE price > 100 AND stock > 0 AND featured = true
Pattern Matching
customers WHERE email LIKE '%@gmail.com' AND name LIKE 'A%'
Range Queries
orders WHERE total BETWEEN 1000 AND 5000 AND created_date > '2024-01-01'
Complex Logic
employees WHERE
(department = 'Engineering' AND salary > 90000)
OR (department = 'Sales' AND commission > 50000)
OR (role = 'Executive')
ORDER BY salary DESC
LIMIT 50
Field Selection
SELECT name, email, department FROM employees WHERE active = true ORDER BY name ASC
Pagination
products WHERE category = 'Electronics' ORDER BY rating DESC LIMIT 20 OFFSET 40 # Page 3 (40-60)
Streaming
For large datasets, use streaming to process data in chunks without loading everything into memory.
Basic Streaming
from aton_format import ATONStreamEncoder, CompressionMode
# Initialize stream encoder
stream_encoder = ATONStreamEncoder(
chunk_size=1000,
compression=CompressionMode.ULTRA
)
# Large dataset
data = {
"transactions": [
# ... 1 million records
]
}
# Stream encode
for chunk in stream_encoder.stream_encode(data):
# Chunk structure
print(f"Chunk {chunk['chunk_id']} of {chunk['total_chunks']}")
print(f"First chunk: {chunk['is_first']}")
print(f"Last chunk: {chunk['is_last']}")
print(f"Records: {chunk['metadata']['records_in_chunk']}")
# Send chunk to LLM or store
send_to_llm(chunk['data'])
Schema Caching
The first chunk includes schema and defaults. Subsequent chunks only contain data rows.
Schema is inferred once and cached. Subsequent chunks use minimal formatting, maximizing compression.
Error Handling
ATON V2 provides comprehensive error handling with a custom exception hierarchy.
Exception Hierarchy
ATONError (base) |-- ATONEncodingError |-- ATONDecodingError |-- ATONQueryError `-- ATONCompressionError
Error Handling Example
from aton_format import (
ATONEncoder,
ATONEncodingError,
ATONDecodingError,
ATONQueryError
)
try:
encoder = ATONEncoder(validate=True)
result = encoder.encode(data)
except ATONEncodingError as e:
print(f"Encoding failed: {e}")
# Handle encoding error
except ATONQueryError as e:
print(f"Query failed: {e}")
# Handle query error
except Exception as e:
print(f"Unexpected error: {e}")
# Handle unexpected errors
Validation
# Enable validation (default) encoder = ATONEncoder(validate=True) # Validation checks: # - Data structure # - Type consistency # - Required fields # - Value ranges
Performance
Token Reduction Benchmarks
| Dataset | JSON Tokens | ATON Tokens | Reduction |
|---|---|---|---|
| Employee Records (1K) | 12,450 | 5,280 | 57.6% |
| Product Catalog (10K) | 145,200 | 64,800 | 55.4% |
| Transaction Log (100K) | 1,856,000 | 815,000 | 56.1% |
| Agent States (Real-time) | 42,000 | 18,500 | 56.0% |
Compression Speed
| Mode | Records/sec | Latency |
|---|---|---|
| FAST | ~50,000 | <1ms |
| BALANCED | ~35,000 | <2ms |
| ULTRA | ~20,000 | <5ms |
| ADAPTIVE | ~30,000 | <3ms |
Tests performed on Apple M1 Pro, 16GB RAM, Python 3.11. Results may vary based on hardware and data characteristics.
Best Practices
1. Choose the Right Compression Mode
# Real-time applications encoder = ATONEncoder(compression=CompressionMode.FAST) # General purpose (recommended) encoder = ATONEncoder(compression=CompressionMode.BALANCED) # Batch processing / storage encoder = ATONEncoder(compression=CompressionMode.ULTRA) # Mixed workloads encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)
2. Use Streaming for Large Datasets
# DON'T: Load everything in memory
data = load_million_records()
encoded = encoder.encode(data) # Memory spike!
# DO: Use streaming
stream_encoder = ATONStreamEncoder(chunk_size=1000)
for chunk in stream_encoder.stream_encode(data):
process_chunk(chunk) # Constant memory!
3. Leverage Query Language
# DON'T: Send all data then filter
all_data = encoder.encode(large_dataset)
# LLM has to process everything...
# DO: Filter before encoding
filtered = encoder.encode_with_query(
large_dataset,
"employees WHERE salary > 100000 LIMIT 100"
)
# Only 100 relevant records sent!
4. Enable Optimization
# Always enable optimization for repeated structures
encoder = ATONEncoder(
optimize=True, # Enable schemas and defaults
compression=CompressionMode.BALANCED
)
5. Handle Errors Gracefully
from aton_format import ATONError
try:
result = encoder.encode(data)
except ATONError as e:
# ATON-specific errors
logger.error(f"ATON error: {e}")
fallback_to_json(data)
except Exception as e:
# Unexpected errors
logger.critical(f"Critical error: {e}")
raise
6. Validate in Production
# Always validate in production encoder = ATONEncoder(validate=True) decoder = ATONDecoder(validate=True) # Round-trip test encoded = encoder.encode(data) decoded = decoder.decode(encoded) assert data == decoded
7. Benchmark Your Data
import time
# Test different modes
modes = [CompressionMode.FAST, CompressionMode.BALANCED, CompressionMode.ULTRA]
for mode in modes:
encoder = ATONEncoder(compression=mode)
start = time.time()
result = encoder.encode(data)
duration = time.time() - start
print(f"{mode.value}: {duration:.3f}s, {len(result)} chars")
Always test ATON with your specific data patterns. Token reduction and performance vary based on data characteristics.
Additional Resources
For more information, check out:
- Technical Whitepaper - Deep dive into algorithms and benchmarks
- GitHub Repository - Source code and examples
- Report Issues - Bug reports and feature requests
- PyPI Package - Python installation