GEPA (Genetic-Pareto)
GEPA (Genetic-Pareto) is an advanced prompt optimizer that uses genetic algorithms for multi-objective optimization of DSPy programs. It combines evolutionary computation with Pareto-optimal solution selection to find the best trade-offs between different optimization objectives.
Quick Start
require 'dspy'
# Configure your LM
DSPy.configure do |config|
config.lm = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
end
# Create a signature
class QASignature < DSPy::Signature
description "Answer questions accurately"
input do
const :question, String
end
output do
const :answer, String
end
end
# Create a program to optimize
program = DSPy::Predict.new(QASignature)
# Create training examples
trainset = [
DSPy::Example.new(
signature_class: QASignature,
input: { question: "What is 2+2?" },
expected: { answer: "4" }
),
DSPy::Example.new(
signature_class: QASignature,
input: { question: "What is the capital of France?" },
expected: { answer: "Paris" }
)
]
# Define a metric with feedback
class ExactMatchMetric
include DSPy::Teleprompt::GEPAFeedbackMetric
def call(example, prediction, trace = nil)
expected = example.expected_values[:answer].downcase.strip
actual = prediction.answer.downcase.strip
if expected == actual
DSPy::Teleprompt::ScoreWithFeedback.new(
score: 1.0,
prediction: prediction,
feedback: "Correct answer"
)
else
DSPy::Teleprompt::ScoreWithFeedback.new(
score: 0.0,
prediction: prediction,
feedback: "Expected '#{expected}', got '#{actual}'. Check accuracy."
)
end
end
end
# Configure reflection LM (required)
config = DSPy::Teleprompt::GEPA::GEPAConfig.new
config.reflection_lm = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
# Optimize with GEPA (Genetic-Pareto)
gepa = DSPy::Teleprompt::GEPA.new(metric: ExactMatchMetric.new, config: config)
optimized_program = gepa.compile(program, trainset: trainset)
# Use the optimized program
result = optimized_program.call(question: "What is 3+3?")
puts result.answer
Configuration
Basic Configuration
config = DSPy::Teleprompt::GEPA::GEPAConfig.new
config.population_size = 8 # Number of program variants (default: 8)
config.num_generations = 10 # Optimization rounds (default: 10)
config.mutation_rate = 0.7 # Probability of mutation (default: 0.7)
config.crossover_rate = 0.6 # Probability of crossover (default: 0.6)
config.use_pareto_selection = true # Use Pareto frontier selection (default: true)
config.simple_mode = false # Use simple optimization without full genetic algorithm (default: false)
config.reflection_lm = DSPy::LM.new("openai/gpt-4o", api_key: ENV['OPENAI_API_KEY']) # LM for reflection (required)
gepa = DSPy::Teleprompt::GEPA.new(metric: metric, config: config)
Light Configuration (Fast)
config = DSPy::Teleprompt::GEPA::GEPAConfig.new
config.population_size = 4
config.num_generations = 2
config.mutation_rate = 0.8
config.reflection_lm = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
gepa = DSPy::Teleprompt::GEPA.new(metric: metric, config: config)
Heavy Configuration (Thorough)
config = DSPy::Teleprompt::GEPA::GEPAConfig.new
config.population_size = 12
config.num_generations = 15
config.mutation_rate = 0.6
config.crossover_rate = 0.8
config.reflection_lm = DSPy::LM.new("openai/gpt-4o", api_key: ENV['OPENAI_API_KEY'])
gepa = DSPy::Teleprompt::GEPA.new(metric: metric, config: config)
All Configuration Options
config = DSPy::Teleprompt::GEPA::GEPAConfig.new
# Core genetic algorithm settings
config.population_size = 8 # Number of program variants in each generation
config.num_generations = 10 # Number of evolution iterations
config.mutation_rate = 0.7 # Probability of mutation (0.0-1.0)
config.crossover_rate = 0.6 # Probability of crossover (0.0-1.0)
# Algorithm behavior
config.use_pareto_selection = true # Use Pareto frontier for multi-objective optimization
config.simple_mode = false # If true, uses simplified optimization without full GA
# LM for reflection (required - no default)
config.reflection_lm = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
# Advanced: Mutation types (default: all types)
config.mutation_types = [
DSPy::Teleprompt::GEPA::MutationType::Rewrite, # Complete rewording
DSPy::Teleprompt::GEPA::MutationType::Expand, # Add detail/context
DSPy::Teleprompt::GEPA::MutationType::Simplify, # Remove complexity
DSPy::Teleprompt::GEPA::MutationType::Combine, # Merge with another
DSPy::Teleprompt::GEPA::MutationType::Rephrase # Minor rewording
]
# Advanced: Crossover types (default: all types)
config.crossover_types = [
DSPy::Teleprompt::GEPA::CrossoverType::Uniform, # Random selection
DSPy::Teleprompt::GEPA::CrossoverType::Blend, # Weighted combination
DSPy::Teleprompt::GEPA::CrossoverType::Structured # Structure-aware
]
Writing Metrics
GEPA can work with two types of metrics:
Option 1: Simple Proc Metric (Easiest)
For basic use cases, you can use a simple proc that returns a float score:
# Simple accuracy metric
metric = proc do |example, prediction|
expected = example.expected_values[:answer]
actual = prediction.answer
expected == actual ? 1.0 : 0.0
end
# Use with GEPA
gepa = DSPy::Teleprompt::GEPA.new(metric: metric, config: config)
Note: When using a simple proc, GEPA will still perform optimization but without detailed feedback for reflection.
Option 2: Feedback Metric with GEPAFeedbackMetric (Advanced)
class FeedbackMetric
include DSPy::Teleprompt::GEPAFeedbackMetric
def call(example, prediction, trace = nil)
expected = example.expected_values[:answer]
actual = prediction.answer
score = calculate_score(expected, actual)
feedback = generate_feedback(expected, actual, score)
DSPy::Teleprompt::ScoreWithFeedback.new(
score: score,
prediction: prediction,
feedback: feedback
)
end
private
def calculate_score(expected, actual)
# Your scoring logic
expected.downcase == actual.downcase ? 1.0 : 0.0
end
def generate_feedback(expected, actual, score)
if score > 0.8
"Good answer"
elsif score > 0.5
"Partially correct, but could be more precise"
else
"Expected '#{expected}', got '#{actual}'. Review the question carefully."
end
end
end
Comparing Optimizers
# Test baseline
baseline_score = evaluate_program(program, testset, simple_metric)
# Optimize with MIPROv2
mipro = DSPy::Teleprompt::MIPROv2.new(metric: simple_metric)
mipro_optimized = mipro.compile(program, trainset: trainset)
mipro_score = evaluate_program(mipro_optimized, testset, simple_metric)
# Optimize with GEPA (Genetic-Pareto)
gepa = DSPy::Teleprompt::GEPA.new(metric: feedback_metric)
gepa_optimized = gepa.compile(program, trainset: trainset)
gepa_score = evaluate_program(gepa_optimized, testset, simple_metric)
puts "Baseline: #{baseline_score}"
puts "MIPROv2: #{mipro_score}"
puts "GEPA (Genetic-Pareto): #{gepa_score}"
When to Use GEPA (Genetic-Pareto)
Use GEPA (Genetic-Pareto) when:
- You want detailed feedback on why examples fail
- Your task benefits from iterative prompt refinement
- You have complex evaluation criteria
- You need to understand optimization decisions
Use MIPROv2 when:
- You have simple success/failure metrics
- You want faster optimization
- Your prompts are already close to optimal
- You prefer less configuration
Common Patterns
Math Problems
class MathSignature < DSPy::Signature
description "Solve math problems step by step"
input do
const :problem, String
end
output do
const :answer, String
const :reasoning, String
end
end
class MathMetric
include DSPy::Teleprompt::GEPAFeedbackMetric
def call(example, prediction, trace = nil)
expected_num = extract_number(example.expected_values[:answer])
actual_num = extract_number(prediction.answer)
if expected_num && actual_num && expected_num == actual_num
DSPy::Teleprompt::ScoreWithFeedback.new(
score: 1.0,
prediction: prediction,
feedback: "Correct calculation"
)
else
DSPy::Teleprompt::ScoreWithFeedback.new(
score: 0.0,
prediction: prediction,
feedback: "Check your arithmetic. Show step-by-step work."
)
end
end
private
def extract_number(text)
text.scan(/\d+/).first&.to_i
end
end
Text Classification
class ClassificationMetric
include DSPy::Teleprompt::GEPAFeedbackMetric
def call(example, prediction, trace = nil)
expected = example.expected_values[:category].downcase
actual = prediction.category.downcase
score = expected == actual ? 1.0 : 0.0
feedback = if score > 0
"Correct classification"
else
"Misclassified as '#{actual}', should be '#{expected}'. Consider the key indicators in the text."
end
DSPy::Teleprompt::ScoreWithFeedback.new(
score: score,
prediction: prediction,
feedback: feedback
)
end
end
Troubleshooting
GEPA (Genetic-Pareto) Takes Too Long
- Reduce
population_size
andnum_generations
- Use a faster model:
config.reflection_lm = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
- Enable simple mode:
config.simple_mode = true
- Use fewer training examples
Poor Optimization Results
- Check your metric implementation
- Ensure training examples are representative
- Add more diverse training examples
- Increase
num_generations
Feedback Not Helpful
- Make feedback more specific in your metric
- Include examples of good vs bad answers
- Reference the input context in feedback
API Errors
- Check your API key configuration
- Verify LM model names
- Handle rate limiting in your metric
Examples
See the examples/
directory for complete working examples:
minimal_gepa_test.rb
- Basic GEPA (Genetic-Pareto) usagesimple_gepa_benchmark.rb
- GEPA (Genetic-Pareto) vs MIPROv2 comparisongepa_benchmark.rb
- Comprehensive benchmarking
These examples show real usage patterns and can be run with your API keys.
Related Topics
Other Optimization Methods
- MIPROv2 - Alternative optimization algorithm for comparison with GEPA
- Simple Optimizer - Quick random search optimization for getting started
- Prompt Optimization - General principles of programmatic prompt improvement
Evaluation & Metrics
- Evaluation - Systematically test and measure your optimized programs
- Custom Metrics - Build domain-specific evaluation metrics for GEPA optimization
- Benchmarking - Compare optimized vs unoptimized approaches
Advanced Applications
- RAG Optimization - Use GEPA to optimize retrieval-augmented generation systems
- Complex Types - Optimize programs working with structured data
- Multi-stage Pipelines - Optimize complex processing workflows
Framework Comparison
- DSPy.rb vs LangChain - Compare optimization capabilities across Ruby frameworks