Storage System
DSPy.rb provides a storage system for persisting optimization results and program states. This enables saving and reloading optimized predictors, tracking optimization history, and managing multiple versions of your programs.
Overview
The storage system supports:
- Program Storage: Persist optimized predictors and their configurations
- Optimization History: Track optimization runs and their results
- Program Discovery: Find programs by various criteria
- Checkpoint Management: Save and restore optimization checkpoints
- Import/Export: Share programs between environments
Basic Usage
Storing Optimization Results
# Run optimization
optimizer = DSPy::MIPROv2.new(signature: ClassifyText)
result = optimizer.optimize(examples: examples)
# Store the result using ProgramStorage directly
storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage")
saved_program = storage.save_program(
result.optimized_program,
result,
metadata: {
signature_class: 'ClassifyText',
optimizer: 'MIPROv2',
examples_count: examples.size
}
)
puts "Stored program with ID: #{saved_program.program_id}"
Using StorageManager (Recommended)
# Configure storage manager
storage_manager = DSPy::Storage::StorageManager.new
# Save optimization result automatically
saved_program = storage_manager.save_optimization_result(
result,
tags: ['production', 'sentiment-analysis'],
description: 'Optimized sentiment classifier v2'
)
# Or use the global instance
DSPy::Storage::StorageManager.save(result, metadata: { version: '2.0' })
Loading Programs
# Load by ID
storage = DSPy::Storage::ProgramStorage.new
saved_program = storage.load_program(program_id)
if saved_program
predictor = saved_program.program
puts "Loaded program from #{saved_program.saved_at}"
puts "Best score: #{saved_program.optimization_result[:best_score_value]}"
end
# Or using StorageManager
loaded = DSPy::Storage::StorageManager.load(program_id)
Storage Organization
The storage system uses a file-based approach with JSON serialization:
dspy_storage/
├── programs/
│ ├── abc123def456.json # Individual program files
│ ├── 789xyz012345.json
│ └── ...
└── history.json # Program history and metadata
Finding Programs
Search by Criteria
storage_manager = DSPy::Storage::StorageManager.new
# Find all programs for a specific optimizer
mipro_programs = storage_manager.find_programs(
optimizer: 'MIPROv2',
min_score: 0.85
)
# Find recent programs
recent_programs = storage_manager.find_programs(
max_age_days: 7,
signature_class: 'ClassifyText'
)
# Find by tags
production_programs = storage_manager.find_programs(
tags: ['production']
)
Get Best Program
# Get the best performing program for a signature
best_program = storage_manager.get_best_program('ClassifyText')
if best_program
predictor = best_program.program
score = best_program.optimization_result[:best_score_value]
puts "Best classifier score: #{score}"
end
# Using global instance
best = DSPy::Storage::StorageManager.best('ClassifyText')
Checkpoints
Create and restore checkpoints during long-running optimizations:
# Create a checkpoint
checkpoint = storage_manager.create_checkpoint(
current_result,
'iteration_50',
metadata: { iteration: 50, current_score: 0.87 }
)
# Restore from checkpoint
restored = storage_manager.restore_checkpoint('iteration_50')
if restored
program = restored.program
# Continue optimization from checkpoint...
end
Import/Export
Share programs between environments or backup your optimizations:
storage = DSPy::Storage::ProgramStorage.new
# Export multiple programs
program_ids = ['abc123', 'def456', 'ghi789']
storage.export_programs(program_ids, './export_backup.json')
# Import programs
imported_programs = storage.import_programs('./export_backup.json')
puts "Imported #{imported_programs.size} programs"
History and Analytics
Track optimization trends and performance over time:
# Get optimization history with trends
history = storage_manager.get_optimization_history
puts "Total programs: #{history[:summary][:total_programs]}"
puts "Average score: #{history[:summary][:avg_score]}"
# View optimizer statistics
history[:optimizer_stats].each do |optimizer, stats|
puts "#{optimizer}: #{stats[:count]} programs, best score: #{stats[:best_score]}"
end
# Check improvement trends
trends = history[:trends]
puts "Performance improvement: #{trends[:improvement_percentage]}%"
Program Comparison
Compare two saved programs:
comparison = storage_manager.compare_programs(program_id_1, program_id_2)
puts "Score difference: #{comparison[:comparison][:score_difference]}"
puts "Better program: #{comparison[:comparison][:better_program]}"
puts "Age difference: #{comparison[:comparison][:age_difference_hours]} hours"
Storage Management
Configuration
# Create custom storage configuration
config = DSPy::Storage::StorageManager::StorageConfig.new
config.storage_path = Rails.root.join('dspy_storage')
config.auto_save = true
config.save_intermediate_results = false
config.max_stored_programs = 100
# Initialize with custom config
storage_manager = DSPy::Storage::StorageManager.new(config: config)
Cleanup Old Programs
Manage storage space by removing old programs:
# Clean up programs beyond the configured maximum
deleted_count = storage_manager.cleanup_old_programs
puts "Deleted #{deleted_count} old programs"
# The cleanup keeps the best performing and most recent programs
# based on a weighted score (70% performance, 30% recency)
List All Programs
storage = DSPy::Storage::ProgramStorage.new
# Get all stored programs
programs = storage.list_programs
programs.each do |program|
puts "ID: #{program[:program_id]}"
puts "Score: #{program[:best_score]}"
puts "Saved: #{program[:saved_at]}"
puts "---"
end
# Get full history with metadata
history = storage.get_history
Events and Monitoring
The storage system emits instrumentation events for monitoring:
dspy.storage.save_start
- When save beginsdspy.storage.save_complete
- Successful save with file sizedspy.storage.save_error
- Save failuresdspy.storage.load_start
- When load beginsdspy.storage.load_complete
- Successful load with agedspy.storage.load_error
- Load failuresdspy.storage.delete
- Program deletiondspy.storage.export
- Export operationsdspy.storage.import
- Import operationsdspy.storage.cleanup
- Cleanup operations
Best Practices
1. Consistent Metadata
Always include descriptive metadata for easier program discovery:
metadata = {
signature_class: signature.class.name,
version: '1.0.0',
environment: Rails.env,
purpose: 'customer_sentiment_analysis',
dataset: 'customer_reviews_2024',
performance_metrics: {
accuracy: 0.92,
f1_score: 0.89
}
}
2. Use Tags Effectively
Tags help organize and find programs:
tags = [
Rails.env, # 'production', 'staging', 'development'
'validated', # Passed validation tests
'a_b_tested', # Used in A/B tests
'v2_architecture' # Architecture version
]
3. Regular Cleanup
Schedule periodic cleanup to manage storage:
# In a rake task or background job
task :cleanup_dspy_storage do
manager = DSPy::Storage::StorageManager.instance
deleted = manager.cleanup_old_programs
puts "Cleaned up #{deleted} old programs"
end
4. Checkpoint Long Optimizations
For optimizations that take hours or days:
# Save checkpoints every N iterations
if iteration % 10 == 0
storage_manager.create_checkpoint(
current_result,
"auto_checkpoint_#{iteration}"
)
end