Article

JSON Native or Enhanced Prompting? Choosing the Right DSPy.rb Strategy

Why Enhanced Prompting beats JSON Native APIs in cost and compatibility plus when to break the rule

V

Vicente Reig

Fractional Engineering Lead •

Getting reliable, structured data from Large Language Models is crucial for production applications. DSPy.rb solves this with five different JSON extraction strategies, each optimized for specific AI providers. After benchmarking across 13 AI models, here’s your complete guide to choosing the right approach.

This test used baseline DSPy.rb’s Enhanced Prompts without any optimization beyond writing modular and typed Signatures. For production workloads, consider prompt optimization to improve performance further.

Five Extraction Strategies

  • Enhanced Prompting: Universal compatibility (works with all 13 models tested). This is DSPy’s style JSON Schema prompting.
  • OpenAI Structured Output: Native API enforcement for GPT models. Including nuances on their JSON Schema implementation.
  • Anthropic Tool Use: Function calling for all Claude models.
  • Anthropic Extraction: Text completion with guided parsing for Claude.
  • Gemini Structured Output: Native structured generation for Gemini 1.5 Pro.

Performance Benchmark Results

Even though reliability wasn’t the goal of this benchmark, all strategies achieved 100% success rate in generating JSON and handling potentially invalid responses.

Strategy Response Time Success Rate Token Efficiency Cost (Best Model)
Gemini Structured 3.42s 100% 800 tokens $0.0019
Anthropic Tool Use 6.23s 100% 800-1500 tokens $0.001408
Anthropic Extraction 6.41s 100% 800-1500 tokens $0.001408
Enhanced Prompting 7.52s 100% 800-1500 tokens $0.000114
OpenAI Structured 9.39s 100% 1200-1500 tokens $0.000342

Average Response Times by Strategy

Strategy Avg Response Time (seconds)
Gemini Structured 3.49s
Anthropic Extraction 5.58s
Anthropic Tool Use 6.09s
Enhanced Prompting 10.16s
OpenAI Structured 13.56s

Based on benchmark data across 28 test runs. Gemini Structured Output leads with 3.49s average, while OpenAI Structured Output takes the longest at 13.56s.

Token Consumption Analysis

Most Token Efficient (800 tokens):

  • Claude 3.5 Haiku, Gemini models, o1-mini

Standard Usage (1200 tokens):

  • GPT-4o series, Claude Sonnet 4

Highest Usage (1500 tokens):

  • GPT-5 series, Claude Opus 4.1

Cost per Token Leaders:

  1. Gemini 1.5 Flash: $0.0000001425 per token
  2. GPT-5-nano: $0.00000011 per token
  3. GPT-4o-mini: $0.000000285 per token

Token Efficiency Distribution Across Models

Token Usage Number of Models
800 tokens 4 models
1000 tokens 2 models
1200 tokens 3 models
1500 tokens 4 models

Most models cluster around maximum efficiency (800 tokens) or maximum context (1500 tokens). Claude 3.5 Haiku and Gemini models lead in efficiency.

Token Insight: Strategy choice doesn’t significantly impact token usage—it’s primarily model-dependent. Focus on model selection for token efficiency.

Cost Efficiency by Strategy (Best Model per Strategy)

Strategy Cost per Extraction
Enhanced Prompting $0.000114
OpenAI Structured $0.000165
Anthropic Tool Use $0.001408
Anthropic Extraction $0.001408
Gemini Structured $0.0019

Enhanced Prompting with Gemini Flash delivers the lowest cost at $0.000114 per extraction—17x cheaper than the next best option. View benchmark source.

Speed Variability by Strategy (Min/Max/Average)

Strategy Min Time Average Time Max Time
Gemini Structured 3.49s 3.49s 3.49s
Anthropic Extraction 2.68s 5.58s 10.26s
Anthropic Tool Use 3.69s 6.09s 10.81s
Enhanced Prompting 1.84s 10.16s 33.31s
OpenAI Structured 2.27s 13.56s 23.26s

Enhanced Prompting shows the highest speed variability (1.84s to 33.31s) due to model diversity, while Gemini Structured offers consistent performance. Provider-specific strategies show more predictable ranges.

Quick Decision Matrix

Use Case Recommended Strategy Model Cost Speed
Startup/MVP Enhanced Prompting Gemini Flash $0.000114 7.52s
High Volume Gemini Structured Gemini Pro $0.0019 3.42s
Enterprise Multi-Provider Enhanced Prompting Multiple Varies 7.52s
Maximum Reliability Provider-Specific Any Compatible Varies 6.23-9.39s
Cost-Sensitive Enhanced Prompting Gemini Flash $0.000114 7.52s

Key Findings

  • Speed Champion: Gemini Structured Output (3.42s) but limited to one model
  • Universal Choice: Enhanced Prompting works across all providers with 100% success
  • Cost Winner: Gemini Flash + Enhanced Prompting at $0.000114 per extraction
  • Reliability: All provider-specific strategies achieve 100% success rates
  • Token Efficiency: Choose Claude Haiku or Gemini for lowest token consumption

Implementation

DSPy.rb uses Signatures to define structured inputs and outputs. Here’s an example using T::Enum types:

class SearchDepth < T::Enum
  enums do
    Shallow = new('shallow')
    Medium = new('medium')
    Deep = new('deep')
  end
end

class DeepResearch < DSPy::Signature
  input do
    const :query, String
    const :effort, SearchDepth, default: SearchDepth::Shallow
  end
  output { const :summary, String }
end

# Configure DSPy with your preferred model
DSPy.configure do |c|
  c.lm = DSPy::LM.new('gemini/gemini-1.5-flash',
                      api_key: ENV['GEMINI_API_KEY'], 
                      structured_outputs: true)  # Supports gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-exp
end

predictor = DSPy::Predict.new(DeepResearch)
search_result = predictor.call(query: "How does Stripe's API design influence developer adoption?")
puts "Summary: #{search_result.summary}"

This example shows DSPy.rb’s core components working together:

  • Configuration: Set up your language model
  • Predictors: The DSPy::Predict class handles JSON extraction automatically

Recommendations

Start with Enhanced Prompting + Gemini Flash for most applications:

  • Universal compatibility across all providers
  • Lowest cost at $0.000114 per extraction
  • Easy provider switching without code changes
  • Consider benchmarking your own workloads

Optimize later with provider-specific strategies for critical use cases requiring 100% reliability, or use prompt optimization to improve Enhanced Prompting performance. Set up evaluation metrics to measure improvement.

Economic Reality: Gemini Flash costs 144x less than Claude Opus while delivering production-quality results—you can perform 144 extractions for the cost of one premium extraction.

For enterprise deployments, implement production observability to monitor extraction quality across providers.

Future: BAML-Inspired Enhanced Prompting

We’re developing sorbet-baml, a next-generation approach to Enhanced Prompting that could reduce token usage by 50-70% while improving accuracy. This initiative (GitHub #70) transforms verbose JSON schemas into TypeScript-like syntax with inline comments:

Current JSON Schema: 150 tokens BAML Format: 45 tokens (70% reduction)

Expected benefits:

  • Lower costs: Dramatically reduced token consumption for complex schemas
  • Better accuracy: Up to 20% improvement for nested structures
  • Universal compatibility: Works with all providers (OpenAI, Anthropic, Gemini, Ollama)

This enhancement will integrate seamlessly with DSPy.rb’s existing Enhanced Prompting strategy, providing automatic optimization without code changes.


Benchmark: 27 tests across 5 strategies and 13 AI models. Total cost: $0.2302. September 14, 2025.