JSON Native or Enhanced Prompting? Choosing the Right DSPy.rb Strategy
Why Enhanced Prompting beats JSON Native APIs in cost and compatibility plus when to break the rule
Vicente Reig
Fractional Engineering Lead •
Getting reliable, structured data from Large Language Models is crucial for production applications. DSPy.rb solves this with five different JSON extraction strategies, each optimized for specific AI providers. After benchmarking across 13 AI models, here’s your complete guide to choosing the right approach.
This test used baseline DSPy.rb’s Enhanced Prompts without any optimization beyond writing modular and typed Signatures. For production workloads, consider prompt optimization to improve performance further.
Five Extraction Strategies
- Enhanced Prompting: Universal compatibility (works with all 13 models tested). This is DSPy’s style JSON Schema prompting.
- OpenAI Structured Output: Native API enforcement for GPT models. Including nuances on their JSON Schema implementation.
- Anthropic Tool Use: Function calling for all Claude models.
- Anthropic Extraction: Text completion with guided parsing for Claude.
- Gemini Structured Output: Native structured generation for Gemini 1.5 Pro.
Performance Benchmark Results
Even though reliability wasn’t the goal of this benchmark, all strategies achieved 100% success rate in generating JSON and handling potentially invalid responses.
Strategy | Response Time | Success Rate | Token Efficiency | Cost (Best Model) |
---|---|---|---|---|
Gemini Structured | 3.42s | 100% | 800 tokens | $0.0019 |
Anthropic Tool Use | 6.23s | 100% | 800-1500 tokens | $0.001408 |
Anthropic Extraction | 6.41s | 100% | 800-1500 tokens | $0.001408 |
Enhanced Prompting | 7.52s | 100% | 800-1500 tokens | $0.000114 |
OpenAI Structured | 9.39s | 100% | 1200-1500 tokens | $0.000342 |
Average Response Times by Strategy
Strategy | Avg Response Time (seconds) |
---|---|
Gemini Structured | 3.49s |
Anthropic Extraction | 5.58s |
Anthropic Tool Use | 6.09s |
Enhanced Prompting | 10.16s |
OpenAI Structured | 13.56s |
Based on benchmark data across 28 test runs. Gemini Structured Output leads with 3.49s average, while OpenAI Structured Output takes the longest at 13.56s.
Token Consumption Analysis
Most Token Efficient (800 tokens):
- Claude 3.5 Haiku, Gemini models, o1-mini
Standard Usage (1200 tokens):
- GPT-4o series, Claude Sonnet 4
Highest Usage (1500 tokens):
- GPT-5 series, Claude Opus 4.1
Cost per Token Leaders:
- Gemini 1.5 Flash: $0.0000001425 per token
- GPT-5-nano: $0.00000011 per token
- GPT-4o-mini: $0.000000285 per token
Token Efficiency Distribution Across Models
Token Usage | Number of Models |
---|---|
800 tokens | 4 models |
1000 tokens | 2 models |
1200 tokens | 3 models |
1500 tokens | 4 models |
Most models cluster around maximum efficiency (800 tokens) or maximum context (1500 tokens). Claude 3.5 Haiku and Gemini models lead in efficiency.
Token Insight: Strategy choice doesn’t significantly impact token usage—it’s primarily model-dependent. Focus on model selection for token efficiency.
Cost Efficiency by Strategy (Best Model per Strategy)
Strategy | Cost per Extraction |
---|---|
Enhanced Prompting | $0.000114 |
OpenAI Structured | $0.000165 |
Anthropic Tool Use | $0.001408 |
Anthropic Extraction | $0.001408 |
Gemini Structured | $0.0019 |
Enhanced Prompting with Gemini Flash delivers the lowest cost at $0.000114 per extraction—17x cheaper than the next best option. View benchmark source.
Speed Variability by Strategy (Min/Max/Average)
Strategy | Min Time | Average Time | Max Time |
---|---|---|---|
Gemini Structured | 3.49s | 3.49s | 3.49s |
Anthropic Extraction | 2.68s | 5.58s | 10.26s |
Anthropic Tool Use | 3.69s | 6.09s | 10.81s |
Enhanced Prompting | 1.84s | 10.16s | 33.31s |
OpenAI Structured | 2.27s | 13.56s | 23.26s |
Enhanced Prompting shows the highest speed variability (1.84s to 33.31s) due to model diversity, while Gemini Structured offers consistent performance. Provider-specific strategies show more predictable ranges.
Quick Decision Matrix
Use Case | Recommended Strategy | Model | Cost | Speed |
---|---|---|---|---|
Startup/MVP | Enhanced Prompting | Gemini Flash | $0.000114 | 7.52s |
High Volume | Gemini Structured | Gemini Pro | $0.0019 | 3.42s |
Enterprise Multi-Provider | Enhanced Prompting | Multiple | Varies | 7.52s |
Maximum Reliability | Provider-Specific | Any Compatible | Varies | 6.23-9.39s |
Cost-Sensitive | Enhanced Prompting | Gemini Flash | $0.000114 | 7.52s |
Key Findings
- Speed Champion: Gemini Structured Output (3.42s) but limited to one model
- Universal Choice: Enhanced Prompting works across all providers with 100% success
- Cost Winner: Gemini Flash + Enhanced Prompting at $0.000114 per extraction
- Reliability: All provider-specific strategies achieve 100% success rates
- Token Efficiency: Choose Claude Haiku or Gemini for lowest token consumption
Implementation
DSPy.rb uses Signatures to define structured inputs and outputs. Here’s an example using T::Enum types:
class SearchDepth < T::Enum
enums do
Shallow = new('shallow')
Medium = new('medium')
Deep = new('deep')
end
end
class DeepResearch < DSPy::Signature
input do
const :query, String
const :effort, SearchDepth, default: SearchDepth::Shallow
end
output { const :summary, String }
end
# Configure DSPy with your preferred model
DSPy.configure do |c|
c.lm = DSPy::LM.new('gemini/gemini-1.5-flash',
api_key: ENV['GEMINI_API_KEY'],
structured_outputs: true) # Supports gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-exp
end
predictor = DSPy::Predict.new(DeepResearch)
search_result = predictor.call(query: "How does Stripe's API design influence developer adoption?")
puts "Summary: #{search_result.summary}"
This example shows DSPy.rb’s core components working together:
- Configuration: Set up your language model
- Predictors: The
DSPy::Predict
class handles JSON extraction automatically
Recommendations
Start with Enhanced Prompting + Gemini Flash for most applications:
- Universal compatibility across all providers
- Lowest cost at $0.000114 per extraction
- Easy provider switching without code changes
- Consider benchmarking your own workloads
Optimize later with provider-specific strategies for critical use cases requiring 100% reliability, or use prompt optimization to improve Enhanced Prompting performance. Set up evaluation metrics to measure improvement.
Economic Reality: Gemini Flash costs 144x less than Claude Opus while delivering production-quality results—you can perform 144 extractions for the cost of one premium extraction.
For enterprise deployments, implement production observability to monitor extraction quality across providers.
Future: BAML-Inspired Enhanced Prompting
We’re developing sorbet-baml, a next-generation approach to Enhanced Prompting that could reduce token usage by 50-70% while improving accuracy. This initiative (GitHub #70) transforms verbose JSON schemas into TypeScript-like syntax with inline comments:
Current JSON Schema: 150 tokens BAML Format: 45 tokens (70% reduction)
Expected benefits:
- Lower costs: Dramatically reduced token consumption for complex schemas
- Better accuracy: Up to 20% improvement for nested structures
- Universal compatibility: Works with all providers (OpenAI, Anthropic, Gemini, Ollama)
This enhancement will integrate seamlessly with DSPy.rb’s existing Enhanced Prompting strategy, providing automatic optimization without code changes.
Related Articles
- Type-Safe Prediction Objects - Deep dive into DSPy.rb’s type system
- Under the Hood: JSON Extraction - Technical details of extraction strategies
- JSON Parsing Reliability - Techniques for robust JSON handling
Benchmark: 27 tests across 5 strategies and 13 AI models. Total cost: $0.2302. September 14, 2025.