Data Compression
Advanced LZMA compression for efficient storage and fast queries.
Overview
AVA uses LZMA-based compression to significantly reduce storage requirements while maintaining query performance. Data is compressed in blocks, allowing selective decompression for optimal query speed.
Storage Reduction
Query Overhead
Block Size
How It Works
Block-Based Compression
Data is divided into blocks (default 1MB) and compressed independently. This allows:
- Selective decompression - only needed blocks are decompressed
- Parallel processing - multiple blocks can be processed concurrently
- Efficient updates - only modified blocks need recompression
- Fast queries - skip compressed blocks when filtering
Automatic Background Compression
AVA features a background compression thread that automatically compresses idle tables:
- Background Thread: Runs continuously, checking for idle tables every 60 seconds
- Idle Detection: Tables idle for more than 5 minutes (300 seconds) are automatically compressed
- Access Tracking: Last access time is tracked for all tables to determine idleness
- Smart Detection: Only data operations (SELECT, INSERT, UPDATE, DELETE) count as access
- Metadata Exclusion: Metadata operations (.schema, .compression_tables) don't trigger access updates
- Transparent Decompression: Compressed data is automatically decompressed when accessed
Monitoring & Management
Manual Compression
Use the .compress command to manually compress a table:
.compress table_name
Compression Status
View overall compression statistics:
.compression_status
Table Details with Last Access Time
View detailed per-table compression information including last access time:
.compression_tables
This command displays:
- Table name and compression status
- Uncompressed and compressed sizes
- Compression ratio achieved
- Last access timestamp (YYYY-MM-DD HH:MM:SS format)
Implementation Details
Background Thread Architecture
The automatic compression system uses a dedicated background thread in the PDBEngine:
- Thread Lifecycle: Started automatically when PDBEngine initializes, stopped cleanly on shutdown
- Wake Interval: Thread wakes every 60 seconds to check for idle tables
- Idle Threshold: Tables must be idle for 300 seconds (5 minutes) before compression
- Thread Safety: All operations are protected by mutex locks for concurrent access
- Error Handling: Exceptions in background thread are caught and silently ignored to prevent crashes
Access Time Tracking Design
Each table maintains a last access timestamp that is updated intelligently:
- Data Operations Update: SELECT, INSERT, UPDATE, DELETE operations update the timestamp
- Metadata Operations Excluded: .schema, .compression_tables, .compression_status do not update timestamp
- Timestamp Format: Stored as time_t (seconds since epoch), displayed in human-readable format
- Implementation: Conditional parameter in getTable() method controls whether to update access time
Configuration
# Enable compression for a table
ALTER TABLE large_table SET COMPRESSION = 'LZMA';
# Set compression level (1-9, default 6)
ALTER TABLE large_table SET COMPRESSION_LEVEL = 9;
# Set block size (default 1MB)
ALTER TABLE large_table SET BLOCK_SIZE = 2097152;
# Disable compression
ALTER TABLE large_table SET COMPRESSION = 'NONE';
Compression Levels
- Level 1-3: Fast compression, lower ratio (good for frequently updated data)
- Level 4-6: Balanced (recommended for most use cases)
- Level 7-9: Maximum compression, slower (archival data)
Best Practices
✓ Use compression for:
- Large historical datasets
- Archival data
- Read-heavy workloads
- Text and string-heavy data
✗ Avoid compression for:
- Small tables (< 10MB)
- Frequently updated data
- Real-time streaming data
- Already compressed data