Program Persistence And Serialization
DSPy.rb v0.20.0 introduces comprehensive program serialization and storage capabilities, allowing you to save, load, and share optimized DSPy programs with full state preservation.
Vicente Reig
Fractional Engineering Lead • • 4 min read
DSPy.rb v0.20.0 introduces a powerful new capability: complete program persistence and serialization. Thanks to Stefan Froelich’s excellent work, you can now save optimized DSPy programs to disk, reload them later, and share them across environments with full state preservation.
The Problem: Optimization Investment
DSPy optimization can take significant time and computational resources:
- MIPROv2 optimization might run for hours on complex tasks
- Few-shot examples are carefully curated and valuable
- Custom instructions represent domain expertise
- Performance metrics provide valuable insights
Previously, these optimization results existed only in memory. If your process crashed or you needed to deploy to production, you’d lose all that valuable work.
Introducing Program Storage
DSPy.rb now provides a comprehensive storage system that preserves:
- Optimized program state - Instructions, examples, and configuration
- Performance metrics - Scores, optimization history, and metadata
- Version information - DSPy version, Ruby version, and timestamps
- Search history - Complete audit trail of optimization runs
Basic Program Serialization
Every DSPy module now supports serialization through the from_h
method:
require 'dspy'
# Define a signature
class ProductReview < DSPy::Signature
description "Analyze product reviews for sentiment and key insights"
input do
const :review_text, String
const :product_category, String
end
output do
const :sentiment, String
const :key_points, T::Array[String]
const :recommendation, String
const :confidence, Float
end
end
# Create and optimize a program
original_program = DSPy::Predict.new(ProductReview)
.with_instruction("Focus on specific product features and user experience")
.with_examples([
DSPy::FewShotExample.new(
input: {
review_text: "This laptop is incredibly fast and the battery lasts all day!",
product_category: "Electronics"
},
output: {
sentiment: "positive",
key_points: ["fast performance", "excellent battery life"],
recommendation: "recommended",
confidence: 0.95
}
)
])
# Serialize the program
program_data = {
class_name: original_program.class.name,
state: {
signature_class: ProductReview.name,
instruction: original_program.prompt.instruction,
few_shot_examples: original_program.few_shot_examples
}
}
# Save to JSON file
File.write('product_review_program.json', JSON.pretty_generate(program_data))
# Later, deserialize the program
loaded_data = JSON.parse(File.read('product_review_program.json'), symbolize_names: true)
restored_program = DSPy::Predict.from_h(loaded_data)
# The restored program has identical behavior
puts restored_program.prompt.instruction
# => "Focus on specific product features and user experience"
puts restored_program.few_shot_examples.size
# => 1
Advanced Storage with ProgramStorage
For production use, DSPy.rb provides a comprehensive storage system:
require 'dspy/storage/program_storage'
# Initialize storage (creates directory structure)
storage = DSPy::Storage::ProgramStorage.new(
storage_path: "./my_optimized_programs"
)
# Create and optimize a program
program = DSPy::ChainOfThought.new(ProductReview)
.with_instruction("Analyze reviews with step-by-step reasoning")
# Simulate optimization results (normally from MIPROv2, etc.)
optimization_result = {
best_score_value: 0.92,
best_score_name: 'f1_score',
scores: { f1_score: 0.92, precision: 0.89, recall: 0.95 },
history: { total_trials: 25, best_trial: 18 },
metadata: { optimizer: 'MIPROv2', duration_seconds: 1800 }
}
# Save the optimized program
saved_program = storage.save_program(
program,
optimization_result,
metadata: {
task: "product_review_analysis",
dataset: "electronics_reviews_2024",
author: "data_team"
}
)
puts "Saved program: #{saved_program.program_id}"
puts "Best score: #{saved_program.optimization_result[:best_score_value]}"
Loading and Using Saved Programs
Load programs by ID for immediate use:
# Load a previously saved program
loaded_program = storage.load_program(saved_program.program_id)
if loaded_program
# Access the restored program
restored = loaded_program.program
# Use immediately
result = restored.forward(
review_text: "The delivery was fast but the product quality is poor",
product_category: "Electronics"
)
puts "Sentiment: #{result.sentiment}"
puts "Key points: #{result.key_points}"
puts "Reasoning: #{result.reasoning}" # Available with ChainOfThought
# Check optimization metrics
puts "This program achieved #{loaded_program.optimization_result[:best_score_value]} F1 score"
puts "Optimized with #{loaded_program.optimization_result[:metadata][:optimizer]}"
else
puts "Program not found"
end
Program Management and History
Track and manage all your optimized programs:
# List all saved programs
programs = storage.list_programs
programs.each do |program_info|
puts "ID: #{program_info[:program_id]}"
puts "Score: #{program_info[:best_score]} (#{program_info[:score_name]})"
puts "Signature: #{program_info[:signature_class]}"
puts "Saved: #{program_info[:saved_at]}"
puts "---"
end
# Get comprehensive history with statistics
history = storage.get_history
puts "Total programs: #{history[:summary][:total_programs]}"
puts "Average score: #{history[:summary][:avg_score].round(3)}"
puts "Latest save: #{history[:summary][:latest_save]}"
# Programs sorted by performance
best_programs = history[:programs]
.sort_by { |p| -p[:best_score] }
.first(5)
puts "Top 5 performing programs:"
best_programs.each do |program|
puts "#{program[:signature_class]}: #{program[:best_score]}"
end
Import/Export for Collaboration
Share optimized programs across environments:
# Export programs for sharing
program_ids = ['abc123', 'def456', 'ghi789']
export_path = './shared_programs.json'
storage.export_programs(program_ids, export_path)
puts "Exported #{program_ids.size} programs to #{export_path}"
# On another system or environment
new_storage = DSPy::Storage::ProgramStorage.new(
storage_path: "./production_programs"
)
# Import the shared programs
imported_programs = new_storage.import_programs('./shared_programs.json')
puts "Imported #{imported_programs.size} programs"
imported_programs.each do |saved_program|
puts "Available: #{saved_program.program_id} (score: #{saved_program.optimization_result[:best_score_value]})"
end
Integration with Optimization Workflows
Seamlessly integrate storage with your optimization workflows:
class ProductAnalysisOptimizer
def initialize(storage_path: "./optimized_programs")
@storage = DSPy::Storage::ProgramStorage.new(storage_path: storage_path)
end
def optimize_for_task(signature_class, training_data, task_name)
puts "Starting optimization for #{task_name}..."
# Create base program
program = DSPy::ChainOfThought.new(signature_class)
# Run optimization (using MIPROv2, etc.)
optimizer = DSPy::Optimization::MIPROv2.new(
metric: DSPy::Evaluation::Metric::F1Score.new,
n_trials: 20
)
result = optimizer.optimize(program, training_data)
# Save the optimized program
saved_program = @storage.save_program(
result.program,
{
best_score_value: result.best_score,
best_score_name: 'f1_score',
scores: result.scores,
history: result.optimization_history,
metadata: {
optimizer: 'MIPROv2',
task: task_name,
training_samples: training_data.size
}
},
metadata: {
task: task_name,
optimization_date: Time.now.iso8601,
environment: Rails.env
}
)
puts "Optimization complete! Saved as #{saved_program.program_id}"
puts "Best F1 score: #{result.best_score}"
saved_program
end
def load_best_for_task(task_name)
programs = @storage.list_programs
.select { |p| p[:metadata][:task] == task_name }
.sort_by { |p| -p[:best_score] }
if programs.any?
best_program = @storage.load_program(programs.first[:program_id])
puts "Loaded best #{task_name} program (score: #{best_program.optimization_result[:best_score_value]})"
best_program.program
else
puts "No optimized programs found for task: #{task_name}"
nil
end
end
end
# Usage
optimizer = ProductAnalysisOptimizer.new
# Optimize for a specific task
saved_program = optimizer.optimize_for_task(
ProductReview,
training_data,
"product_sentiment_analysis"
)
# Later, load the best program for production
production_program = optimizer.load_best_for_task("product_sentiment_analysis")
if production_program
# Use in production
result = production_program.forward(
review_text: customer_review,
product_category: product.category
)
end
Version Compatibility and Migration
Storage includes version tracking for compatibility:
# Check version compatibility
programs = storage.list_programs
programs.each do |program_info|
metadata = program_info[:metadata]
if metadata[:dspy_version] != DSPy::VERSION
puts "Warning: Program #{program_info[:program_id]} was saved with DSPy v#{metadata[:dspy_version]}"
puts "Current version: v#{DSPy::VERSION}"
end
if metadata[:ruby_version] != RUBY_VERSION
puts "Note: Program saved with Ruby #{metadata[:ruby_version]}, running #{RUBY_VERSION}"
end
end
# Load with version checking
def safe_load_program(storage, program_id)
saved_program = storage.load_program(program_id)
return nil unless saved_program
saved_version = saved_program.metadata[:dspy_version]
current_version = DSPy::VERSION
if saved_version != current_version
puts "Version mismatch detected:"
puts "Saved with: DSPy v#{saved_version}"
puts "Current: DSPy v#{current_version}"
puts "Program may need reoptimization for best performance"
end
saved_program
end
File Organization and Structure
The storage system creates a clean, organized structure:
my_optimized_programs/
├── programs/
│ ├── abc123def456.json # Individual program files
│ ├── def456ghi789.json
│ └── ghi789jkl012.json
└── history.json # Program index and statistics
Each program file contains:
{
"program_id": "abc123def456",
"saved_at": "2024-08-26T10:30:00Z",
"program_data": {
"class_name": "DSPy::Predict",
"state": {
"signature_class": "ProductReview",
"instruction": "Focus on specific product features...",
"few_shot_examples": [...]
}
},
"optimization_result": {
"best_score_value": 0.92,
"best_score_name": "f1_score",
"scores": {...},
"history": {...}
},
"metadata": {
"dspy_version": "0.20.0",
"ruby_version": "3.3.0",
"task": "product_sentiment_analysis"
}
}
Best Practices
- Organize by Task: Use descriptive metadata to group related programs
- Version Control: Include storage directories in your version control
- Regular Cleanup: Periodically remove outdated programs
- Backup Important Programs: Export critical programs to separate files
- Environment Separation: Use different storage paths for dev/test/prod
# Environment-based storage paths
storage_path = case Rails.env
when 'development'
'./storage/development'
when 'test'
'./storage/test'
when 'production'
ENV['DSPY_STORAGE_PATH'] || './storage/production'
end
storage = DSPy::Storage::ProgramStorage.new(storage_path: storage_path)
Error Handling and Observability
The storage system includes comprehensive logging:
# Storage operations are automatically logged
DSPy.configure do |config|
config.logger = Dry.Logger(:dspy, formatter: :json) do |logger|
logger.add_backend(stream: 'log/dspy_storage.log')
end
end
# Log events include:
# - storage.save_start / storage.save_complete
# - storage.load_start / storage.load_complete
# - storage.save_error / storage.load_error
# - storage.export / storage.import
# - storage.delete
# Monitor your storage operations
saved_program = storage.save_program(program, optimization_result)
# Logs: {"message": "storage.save_complete", "storage.program_id": "abc123",
# "storage.best_score": 0.92, "storage.file_size": 2048}
Migration Guide
If you’re currently managing optimized programs manually:
# Before: Manual serialization
program_state = {
instruction: program.prompt.instruction,
examples: program.few_shot_examples.map(&:to_h)
}
File.write('program.json', JSON.generate(program_state))
# After: Comprehensive storage
storage = DSPy::Storage::ProgramStorage.new
saved_program = storage.save_program(program, optimization_result)
# Automatic metadata, versioning, history tracking, and error handling
Conclusion
Program persistence in DSPy.rb v0.20.0 transforms how you work with optimized programs. Key benefits:
- Investment Protection: Never lose optimization work again
- Collaboration: Share optimized programs across teams
- Version Management: Track program evolution and performance
- Production Ready: Reliable storage with comprehensive error handling
- Audit Trail: Complete history of optimization experiments
Special thanks to Stefan Froelich for implementing this essential feature! Start saving your optimized DSPy programs today and build a library of high-performing AI components. 🚀