MIPROv2 Optimizer

MIPROv2 (Multi-prompt Instruction Proposal with Retrieval Optimization v2) is the state-of-the-art prompt optimization algorithm in DSPy.rb. It combines bootstrap sampling, instruction generation, and advanced Bayesian optimization to automatically improve your predictor’s performance through a sophisticated three-phase optimization process.

Overview

MIPROv2 works by:

  • Bootstrap Phase: Generating high-quality few-shot examples with reasoning traces using multiple bootstrap sets
  • Instruction Proposal Phase: Using a grounded proposer to generate multiple candidate instructions tailored to your task
  • Bayesian Optimization Phase: Intelligently exploring candidate configurations (instruction + few-shot combinations) using Gaussian Processes for optimal selection

The optimizer provides three optimization strategies: greedy (fastest), adaptive (balanced exploration/exploitation), and Bayesian (most sophisticated with GP-based candidate selection).

Basic Usage

Simple Optimization

# Define your signature
class ClassifyText < DSPy::Signature
  description "Classify the sentiment of the given text"
  
  input do
    const :text, String
  end
  
  output do
    const :sentiment, String
    const :confidence, Float
  end
end

# Create program to optimize
program = DSPy::Predict.new(ClassifyText)

# Create optimizer with custom metric
metric = proc do |example, prediction|
  # Return true/false for pass/fail evaluation
  prediction.sentiment.downcase == example.expected_sentiment.downcase
end

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric)

# Run optimization
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

# Use the optimized predictor
best_program = result.optimized_program
final_score = result.best_score_value

puts "Optimization complete!"
puts "Best score: #{final_score}"
puts "Best instruction: #{best_program.prompt.instruction}"

AutoMode Configuration

MIPROv2 provides preset configurations for different optimization scenarios:

# Light optimization - fastest, good for prototyping
# 6 trials, 3 instruction candidates, greedy strategy
optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.light(metric: metric)

# Medium optimization - balanced performance and speed
# 12 trials, 5 instruction candidates, adaptive strategy
optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: metric)

# Heavy optimization - most thorough, best results
# 18 trials, 8 instruction candidates, Bayesian optimization
optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.heavy(metric: metric)

# Run optimization with any mode
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

Custom Configuration

# Class-level configuration (affects all instances)
DSPy::Teleprompt::MIPROv2.configure do |config|
  config.optimization_strategy = :bayesian      # :greedy, :adaptive, or :bayesian
  config.num_trials = 15                        # Total optimization trials
  config.num_instruction_candidates = 8         # Instruction variants to generate
  config.bootstrap_sets = 6                     # Bootstrap example sets to create
  config.max_bootstrapped_examples = 4          # Max examples per bootstrap set
  config.max_labeled_examples = 16              # Max labeled examples to use
  config.init_temperature = 1.2                 # Initial exploration temperature
  config.final_temperature = 0.05               # Final exploitation temperature
  config.early_stopping_patience = 4            # Trials without improvement before stopping
  config.use_bayesian_optimization = true       # Enable Gaussian Process optimization
  config.track_diversity = true                 # Track candidate diversity metrics
end

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric)
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

# Or instance-level configuration (overrides class defaults)
optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric)
optimizer.configure do |config|
  config.optimization_strategy = :adaptive
  config.num_trials = 20
  config.bootstrap_sets = 8
end
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

Configuration Options

Configuration Parameters

MIPROv2 uses dry-configurable for settings management. You can configure at both class and instance levels:

# Class-level configuration (affects all new instances)
DSPy::Teleprompt::MIPROv2.configure do |config|
  # Core optimization settings
  config.num_trials = 12                          # Total optimization trials to run
  config.num_instruction_candidates = 5           # Number of instruction variants to generate
  config.bootstrap_sets = 5                       # Number of bootstrap example sets
  config.max_bootstrapped_examples = 4            # Max examples per bootstrap set
  config.max_labeled_examples = 16                # Max labeled examples from trainset

  # Optimization strategy (:greedy, :adaptive, or :bayesian)
  config.optimization_strategy = :adaptive
  config.use_bayesian_optimization = true         # Enable Gaussian Process optimization

  # Temperature scheduling for exploration/exploitation balance
  config.init_temperature = 1.0                   # Initial exploration temperature
  config.final_temperature = 0.1                  # Final exploitation temperature

  # Early stopping
  config.early_stopping_patience = 3              # Stop after N trials without improvement

  # Additional tracking
  config.track_diversity = true                    # Track candidate diversity metrics
end

# Instance-level configuration (overrides class defaults)
optimizer = DSPy::Teleprompt::MIPROv2.new(metric: your_metric)
optimizer.configure do |config|
  config.num_trials = 20
  config.optimization_strategy = :bayesian
end

AutoMode Configurations

# Light mode values:
# - num_trials: 6
# - num_instruction_candidates: 3
# - max_bootstrapped_examples: 2
# - max_labeled_examples: 8
# - bootstrap_sets: 3
# - optimization_strategy: :greedy
# - early_stopping_patience: 2

# Medium mode values (balanced default):
# - num_trials: 12
# - num_instruction_candidates: 5
# - max_bootstrapped_examples: 4
# - max_labeled_examples: 16
# - bootstrap_sets: 5
# - optimization_strategy: :adaptive
# - early_stopping_patience: 3

# Heavy mode values (best results):
# - num_trials: 18
# - num_instruction_candidates: 8
# - max_bootstrapped_examples: 6
# - max_labeled_examples: 24
# - bootstrap_sets: 8
# - optimization_strategy: :bayesian  # Uses Gaussian Processes
# - early_stopping_patience: 5

Optimization Phases

Phase 1: Bootstrap Few-Shot Examples

Generate diverse, high-quality few-shot examples using multiple bootstrap strategies:

# MIPROv2 automatically creates multiple candidate sets of few-shot examples
# Each set contains examples with reasoning traces generated using CoT
# Bootstrap creates several independent sets for diversity

# You can observe bootstrap progress through events:
DSPy.events.subscribe('phase_start') do |event_name, attributes|
  if attributes[:phase] == 1 && attributes[:name] == 'bootstrap'
    puts "Starting bootstrap phase..."
  end
end

DSPy.events.subscribe('phase_complete') do |event_name, attributes|
  if attributes[:phase] == 1
    puts "Bootstrap complete. Success rate: #{attributes[:success_rate]}"
    puts "Created #{attributes[:candidate_sets]} bootstrap sets"
  end
end

Phase 2: Instruction Proposal

Generate multiple high-quality instruction candidates using the grounded proposer:

# The grounded proposer analyzes your task and generates contextual instructions:
# - "Analyze the sentiment of the given text step by step, providing detailed reasoning"
# - "Classify the emotional tone by examining key indicators in the text"
# - "Determine sentiment by evaluating positive and negative language patterns"

# Monitor instruction generation:
DSPy.events.subscribe('phase_complete') do |event_name, attributes|
  if attributes[:phase] == 2
    puts "Generated #{attributes[:num_candidates]} instruction candidates"
    puts "Best instruction preview: #{attributes[:best_instruction_preview]}"
  end
end

Phase 3: Bayesian Optimization

Intelligently explore candidate configurations using advanced optimization strategies:

# Creates candidate configurations combining instructions + few-shot examples:
# - Baseline (no modifications)
# - Instruction-only candidates
# - Few-shot-only candidates  
# - Combined candidates (instruction + few-shot examples)

# Uses optimization strategies:
# - Greedy: Exploit best known configurations
# - Adaptive: Balance exploration/exploitation with temperature scheduling
# - Bayesian: Use Gaussian Processes for intelligent candidate selection

# Monitor optimization progress:
DSPy.events.subscribe('trial_start') do |event_name, attributes|
  puts "Trial #{attributes[:trial_number]}: Testing #{attributes[:candidate_id]}"
  puts "Instruction: #{attributes[:instruction_preview]}"
end

DSPy.events.subscribe('trial_complete') do |event_name, attributes|
  if attributes[:is_best]
    puts "New best score: #{attributes[:score]} (Trial #{attributes[:trial_number]})"
  end
end

Working with Results

MIPROv2Result Object

result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

# Access optimization results
puts "Best score: #{result.best_score_value}"
puts "Score name: #{result.best_score_name}"
puts "Total trials: #{result.history[:total_trials]}"
puts "Early stopped: #{result.history[:early_stopped]}"

# Get the optimized program
optimized_program = result.optimized_program

# Access MIPROv2-specific results
puts "Evaluated candidates: #{result.evaluated_candidates.size}"
puts "Bootstrap success rate: #{result.bootstrap_statistics[:success_rate]}"
puts "Proposal themes: #{result.proposal_statistics[:common_themes]}"

# Access optimization trace
if result.optimization_trace[:score_history]
  puts "Score progression: #{result.optimization_trace[:score_history]}"
end

# Access detailed evaluation results for best candidate
if result.best_evaluation_result
  eval_result = result.best_evaluation_result
  puts "Total examples evaluated: #{eval_result.total_examples}"
  puts "Pass rate: #{eval_result.pass_rate}"
  puts "Individual results: #{eval_result.results.size}"
end

Best Configuration Access

# Access best candidate configuration
best_candidates = result.evaluated_candidates.select { |c| c.type == DSPy::Teleprompt::CandidateType::Combined }
best_candidate = best_candidates.first

if best_candidate
  puts "Best instruction: #{best_candidate.instruction}"
  puts "Number of few-shot examples: #{best_candidate.few_shot_examples.size}"
  puts "Candidate type: #{best_candidate.type.serialize}"
  puts "Configuration ID: #{best_candidate.config_id}"
  puts "Metadata: #{best_candidate.metadata}"

  # Inspect few-shot examples
  best_candidate.few_shot_examples.each_with_index do |example, i|
    puts "Example #{i+1}:"
    puts "  Input: #{example.input_values}"
    puts "  Output: #{example.expected_values}"
  end
end

Integration with Storage and Registry

Saving Optimization Results

# Save to storage system
storage = DSPy::Storage::StorageManager.new
saved_program = storage.save_optimization_result(
  result,
  metadata: {
    signature: 'text_classifier',
    optimization_method: 'MIPROv2',
    mode: 'medium'
  }
)

puts "Saved with ID: #{saved_program.program_id}"

Integration with Registry

# Auto-register with registry
registry_manager = DSPy::Registry::RegistryManager.new
registry_manager.integration_config.auto_register_optimizations = true

# This will automatically register the result
version = registry_manager.register_optimization_result(
  result,
  signature_name: 'text_classifier'
)

puts "Registered as version: #{version.version}"

Advanced Usage

Custom Evaluation Logic

# Define custom metric with detailed evaluation
custom_metric = proc do |example, prediction|
  # Return hash with detailed metrics (recommended)
  {
    passed: prediction.sentiment.downcase == example.expected_sentiment.downcase,
    confidence_score: prediction.confidence || 0.0,
    answer_quality: prediction.sentiment ? 1.0 : 0.0,
    reasoning_present: !prediction.reasoning.nil?
  }
end

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: custom_metric)
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

# Access detailed metrics in results
if result.best_evaluation_result
  result.best_evaluation_result.results.each do |eval_result|
    metrics = eval_result.metrics
    puts "Confidence: #{metrics[:confidence_score]}"
    puts "Has reasoning: #{metrics[:reasoning_present]}"
  end
end

Validation Split

# Use separate validation set for unbiased evaluation
# MIPROv2 automatically uses valset if provided, otherwise splits trainset
result = optimizer.compile(
  program,
  trainset: training_examples,
  valset: validation_examples  # Optional: uses 1/3 of trainset if not provided
)

# Force using part of training set for validation
result = optimizer.compile(
  program,
  trainset: training_examples
  # valset: nil - will automatically use trainset.take(trainset.size / 3)
)

Monitoring Progress

# Subscribe to optimization events for detailed progress tracking
DSPy.events.subscribe('miprov2_compile') do |event_name, attributes|
  puts "Starting MIPROv2 optimization with #{attributes[:num_trials]} trials"
  puts "Strategy: #{attributes[:optimization_strategy]}"
  puts "Mode: #{attributes[:mode]}"
end

DSPy.events.subscribe('phase_start') do |event_name, attributes|
  phase_names = { 1 => 'Bootstrap', 2 => 'Instruction Proposal', 3 => 'Optimization' }
  puts "Phase #{attributes[:phase]}: #{phase_names[attributes[:phase]]} starting..."
end

DSPy.events.subscribe('phase_complete') do |event_name, attributes|
  case attributes[:phase]
  when 1
    puts "Bootstrap complete: #{attributes[:success_rate]} success rate"
  when 2  
    puts "Generated #{attributes[:num_candidates]} instruction candidates"
  when 3
    puts "Optimization complete: Best score #{attributes[:best_score]}"
  end
end

DSPy.events.subscribe('trial_complete') do |event_name, attributes|
  status = attributes[:is_best] ? " (NEW BEST!)" : ""
  puts "Trial #{attributes[:trial_number]}: #{attributes[:score]}#{status}"
end

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: custom_metric)
result = optimizer.compile(program, trainset: training_examples)

Best Practices

1. Choose Appropriate Mode

# For quick experimentation (6 trials, greedy strategy)
optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.light(metric: your_metric)

# For production optimization (18 trials, Bayesian optimization)
optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.heavy(metric: your_metric)

# For balanced optimization (12 trials, adaptive strategy)
optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: your_metric)

# All modes support the same compile interface
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

2. Provide Quality Examples

# Use diverse, high-quality training examples
training_examples = [
  DSPy::Example.new(
    signature_class: ClassifyText,
    input: { text: "I love this product! It's amazing." },
    expected: { sentiment: "positive", confidence: 0.9 }
  ),
  DSPy::Example.new(
    signature_class: ClassifyText,
    input: { text: "This is the worst experience I've ever had." },
    expected: { sentiment: "negative", confidence: 0.95 }
  ),
  DSPy::Example.new(
    signature_class: ClassifyText,
    input: { text: "The product is okay, nothing special." },
    expected: { sentiment: "neutral", confidence: 0.7 }
  )
  # ... more diverse examples
]

3. Robust Evaluation

# Robust metric that handles errors gracefully
robust_metric = proc do |example, prediction|
  begin
    # Handle missing predictions
    return { passed: false, error: "no_prediction" } unless prediction
    
    # Handle missing sentiment
    return { passed: false, error: "missing_sentiment" } unless prediction.sentiment
    
    # Successful evaluation
    passed = prediction.sentiment.downcase == example.expected_sentiment.downcase
    confidence = prediction.respond_to?(:confidence) ? prediction.confidence : 0.0
    
    {
      passed: passed,
      confidence_score: confidence,
      sentiment_match: passed,
      prediction_length: prediction.sentiment.length
    }
  rescue => e
    # Handle any unexpected errors
    DSPy.logger.warn("Evaluation error: #{e.message}")
    { passed: false, error: e.message }
  end
end

optimizer = DSPy::Teleprompt::MIPROv2.new(metric: robust_metric)
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

4. Save Your Results

# Always save successful optimizations
if result.best_score_value > 0.8  # Your quality threshold
  storage_manager = DSPy::Storage::StorageManager.new
  storage_manager.save_optimization_result(
    result,
    tags: ['production', 'validated'],
    metadata: {
      dataset: 'customer_reviews_v2',
      optimization_date: Date.current,
      minimum_score: 0.8,
      optimizer: 'MIPROv2',
      strategy: result.history[:optimization_strategy],
      total_trials: result.history[:total_trials]
    }
  )
end

Advanced Features

Bayesian Optimization with Gaussian Processes

MIPROv2 includes state-of-the-art Bayesian optimization using Gaussian Processes for intelligent candidate selection:

# Enable Bayesian optimization (default in heavy mode)
optimizer = DSPy::Teleprompt::MIPROv2.new(metric: your_metric)
optimizer.configure do |config|
  config.optimization_strategy = :bayesian
  config.use_bayesian_optimization = true
end

result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

# Bayesian optimization provides:
# - Intelligent exploration vs exploitation balance
# - Upper Confidence Bound (UCB) acquisition function
# - Gaussian Process modeling of candidate performance
# - Adaptive exploration parameter based on trial progress

Optimization Strategies Comparison

# Greedy Strategy - Fastest
# - Prioritizes unexplored candidates first
# - Then selects highest scoring candidates
# - Best for: Quick experiments, limited compute budget
config.optimization_strategy = :greedy

# Adaptive Strategy - Balanced  
# - Temperature-based exploration/exploitation balance
# - Probabilistic candidate selection with softmax
# - Progressive cooling from exploration to exploitation
# - Best for: General-purpose optimization
config.optimization_strategy = :adaptive

# Bayesian Strategy - Most Sophisticated
# - Gaussian Process modeling of candidate performance
# - Upper Confidence Bound acquisition function
# - Intelligent uncertainty-aware selection
# - Best for: High-stakes optimization, maximum performance
config.optimization_strategy = :bayesian

Candidate Configuration Types

MIPROv2 generates and evaluates four types of candidate configurations:

# Access evaluated candidates to understand what was tested
result.evaluated_candidates.each do |candidate|
  case candidate.type
  when DSPy::Teleprompt::CandidateType::Baseline
    puts "Baseline: No modifications to original program"
  when DSPy::Teleprompt::CandidateType::InstructionOnly
    puts "Instruction-only: #{candidate.instruction[0,50]}..."
  when DSPy::Teleprompt::CandidateType::FewShotOnly
    puts "Few-shot-only: #{candidate.few_shot_examples.size} examples"
  when DSPy::Teleprompt::CandidateType::Combined
    puts "Combined: Instruction + #{candidate.few_shot_examples.size} examples"
    puts "  Instruction: #{candidate.instruction[0,50]}..."
  end
  
  puts "  Config ID: #{candidate.config_id}"
  puts "  Metadata: #{candidate.metadata}"
end

Working with EvaluatedCandidate Data

The EvaluatedCandidate is an immutable Data class that represents a tested configuration:

# EvaluatedCandidate contains:
# - instruction: String - the instruction used
# - few_shot_examples: Array - the few-shot examples used
# - type: CandidateType - the type of candidate (baseline, instruction_only, etc.)
# - metadata: Hash - additional metadata about the candidate
# - config_id: String - unique identifier for this configuration

result.evaluated_candidates.each do |candidate|
  puts "Config ID: #{candidate.config_id}"
  puts "Type: #{candidate.type.serialize}"
  puts "Instruction: #{candidate.instruction}"
  puts "Examples count: #{candidate.few_shot_examples.size}"
  puts "Metadata: #{candidate.metadata}"
  puts "---"
end