Getting reliable, structured data from Large Language Models is crucial for production applications. DSPy.rb supports both enhanced prompting (universal) and native structured outputs (provider-specific). After benchmarking 8 latest models head-to-head, here’s your complete guide to choosing the right approach.

This test compares DSPy.rb’s two primary strategies: Enhanced Prompting (universal) and Native Structured Outputs (provider-specific) using the latest models from OpenAI, Anthropic, and Google as of September 2025.

Two Strategies Compared

Enhanced Prompting: Universal DSPy-style JSON Schema prompting with intelligent fallback handling. Works with any LLM provider.
Native Structured Outputs: Provider-specific structured generation APIs:
- OpenAI: JSON Schema with strict: true enforcement
- Anthropic: Tool use with JSON schema validation
- Google: Gemini native structured output mode

Benchmark Results Overview

Both strategies achieved 100% success rate across all 8 models (16 tests total). Here are the head-to-head comparisons:

Provider	Model	Enhanced Prompting	Native Structured	Winner
OpenAI	gpt-4o	2302ms / $0.002833	1769ms / $0.001658	🏆 Structured (23% faster, 41% cheaper)
OpenAI	gpt-4o-mini	2944ms / $0.000169	2111ms / $0.000097	🏆 Structured (28% faster, 43% cheaper)
OpenAI	gpt-5	16005ms / $0.011895	22921ms / $0.015065	🏆 Enhanced (43% faster, 21% cheaper)
OpenAI	gpt-5-mini	8303ms / $0.001361	10694ms / $0.001881	🏆 Enhanced (29% faster, 28% cheaper)
Anthropic	claude-sonnet-4-5	3411ms / $0.004581	3401ms / $0.005886	🏆 Enhanced (similar speed, 22% cheaper)
Anthropic	claude-opus-4-1	4993ms / $0.02238	4796ms / $0.025335	🏆 Enhanced (4% slower, 12% cheaper)
Google	gemini-2.5-pro	10478ms / $0.001623	6787ms / $0.001023	🏆 Structured (35% faster, 37% cheaper)
Google	gemini-2.5-flash	15704ms / $0.000096	7943ms / $0.000050	🏆 Structured (49% faster, 48% cheaper)

Response Time Comparison by Model

Model / Strategy	Response Time (seconds)
gpt-4o (Structured)	1.769s
gpt-4o-mini (Structured)	2.111s
gpt-4o (Enhanced)	2.302s
gpt-4o-mini (Enhanced)	2.944s
claude-sonnet-4-5 (Structured)	3.401s
claude-sonnet-4-5 (Enhanced)	3.411s
claude-opus-4-1 (Structured)	4.796s
claude-opus-4-1 (Enhanced)	4.993s
gemini-2.5-pro (Structured)	6.787s
gemini-2.5-flash (Structured)	7.943s
gpt-5-mini (Enhanced)	8.303s
gemini-2.5-pro (Enhanced)	10.478s
gpt-5-mini (Structured)	10.694s
gemini-2.5-flash (Enhanced)	15.704s
gpt-5 (Enhanced)	16.005s
gpt-5 (Structured)	22.921s

Based on benchmark data from September 2025. GPT-4o with structured outputs is the fastest at 1.769s, while GPT-5 with structured outputs is the slowest at 22.921s. GPT-4o models show dramatic improvements with structured outputs, while GPT-5 models perform better with enhanced prompting.

Token Consumption Analysis

Token usage varies dramatically by both model and strategy. Modern structured output implementations optimize token efficiency by sending schemas through API parameters rather than in prompts.

Token Usage by Model and Strategy

Model	Enhanced Prompting	Native Structured	Difference
gpt-4o	477→164 (641 total)	255→102 (357 total)	-222 input, -62 output (-284 total, 44% reduction)
gpt-4o-mini	477→163 (640 total)	255→98 (353 total)	-222 input, -65 output (-287 total, 45% reduction)
gpt-5	476→1130 (1606 total)	476→1447 (1923 total)	Same input, +317 output (+317 total, 20% increase)
gpt-5-mini	476→621 (1097 total)	476→881 (1357 total)	Same input, +260 output (+260 total, 24% increase)
claude-sonnet-4-5	597→186 (783 total)	927→207 (1134 total)	+330 input, +21 output (+351 total, 45% increase)
claude-opus-4-1	597→179 (776 total)	654→207 (861 total)	+57 input, +28 output (+85 total, 11% increase)
gemini-2.5-pro	554→186 (740 total)	158→165 (323 total)	-396 input, -21 output (-417 total, 56% reduction)
gemini-2.5-flash	554→180 (734 total)	158→127 (285 total)	-396 input, -53 output (-449 total, 61% reduction)

Key Insights:

OpenAI (GPT-4): Structured outputs dramatically reduce token consumption (-44% to -45% total) by sending schemas via API
OpenAI (GPT-5): Higher output token generation (+20% to +24%) indicates extensive reasoning/thinking tokens
Anthropic: Structured outputs still increase tokens (+11% to +45%) due to tool-use architecture
Google: Structured outputs achieve massive token reduction (-56% to -61% total) through native API integration

Token Efficiency by Model Family

Model	Avg Tokens (Both Strategies)
gpt-4o-mini	497 tokens
gpt-4o	499 tokens
gemini-2.5-flash	510 tokens
gemini-2.5-pro	532 tokens
claude-opus-4-1	819 tokens
claude-sonnet-4-5	959 tokens
gpt-5-mini	1227 tokens
gpt-5	1765 tokens

GPT-4o and Google models are most token-efficient (497-532 tokens average). Claude models use moderate tokens (819-959 average). GPT-5 models generate significantly more tokens (1227-1765 average) due to extensive reasoning/thinking output.

Cost Comparison: All Models and Strategies

Model / Strategy	Cost per Extraction
gemini-2.5-flash (Structured)	$0.000050
gemini-2.5-flash (Enhanced)	$0.000096
gpt-4o-mini (Structured)	$0.000097
gpt-4o-mini (Enhanced)	$0.000169
gemini-2.5-pro (Structured)	$0.001023
gpt-5-mini (Enhanced)	$0.001361
gemini-2.5-pro (Enhanced)	$0.001623
gpt-4o (Structured)	$0.001658
gpt-5-mini (Structured)	$0.001881
gpt-4o (Enhanced)	$0.002833
claude-sonnet-4-5 (Enhanced)	$0.004581
claude-sonnet-4-5 (Structured)	$0.005886
gpt-5 (Enhanced)	$0.011895
gpt-5 (Structured)	$0.015065
claude-opus-4-1 (Enhanced)	$0.02238
claude-opus-4-1 (Structured)	$0.025335

Gemini 2.5 Flash with Structured Outputs delivers the lowest cost at $0.000050 per extraction—507x cheaper than Claude Opus with Structured Outputs. View benchmark source.

Performance by Provider (Average across models)

Provider	Enhanced Prompting	Native Structured
Anthropic	4.202s	4.099s
Google	13.091s	7.365s
OpenAI	7.389s	9.374s

Anthropic shows nearly identical performance between strategies (2.5% faster with structured). Google dramatically improves with structured outputs (44% faster average). OpenAI shows mixed results due to GPT-5’s slower structured output performance offsetting GPT-4o’s improvements (21% slower average with structured).

Quick Decision Matrix

Use Case	Recommended Strategy	Model	Cost	Speed
Cost-Optimized	Native Structured	gemini-2.5-flash	$0.000050	7.943s
Speed-Optimized	Native Structured	gpt-4o	$0.001658	1.769s
OpenAI GPT-4o Users	Native Structured	gpt-4o / gpt-4o-mini	$0.000097-$0.001658	1.769-2.111s
OpenAI GPT-5 Users	Enhanced Prompting	gpt-5 / gpt-5-mini	$0.001361-$0.011895	8.303-16.005s
Anthropic Users	Enhanced Prompting	claude-sonnet-4-5	$0.004581	3.411s
Google Users	Native Structured	gemini-2.5-pro / flash	$0.000050-$0.001023	6.787-7.943s
Multi-Provider	Enhanced Prompting	Varies	Varies	Varies

Key Findings

GPT-4o Dominates: Structured outputs are 23-28% faster and 41-43% cheaper with superior token efficiency (-44% to -45%)
GPT-5 Reasoning Overhead: Enhanced prompting 29-43% faster; GPT-5 generates 1130-1447 output tokens (extensive reasoning)
Google Wins Both Ways: Structured outputs 35-49% faster, 37-48% cheaper, and 56-61% fewer tokens
Anthropic Prefers Enhanced: Enhanced prompting similar speed but 12-22% cheaper than structured outputs
Cost Champion: Gemini 2.5 Flash with structured outputs at $0.000050 per extraction
Speed Champion: GPT-4o with structured outputs at 1.769s
Token Efficiency Revolution: Structured outputs now MORE efficient for OpenAI and Google (vs old implementations)
Universal Reliability: 100% success rate across all 16 tests (8 models × 2 strategies)

Implementation

DSPy.rb uses Signatures to define structured inputs and outputs. Here’s an example using T::Enum types:

class ActionType < T::Enum
  enums do
    Create = new('create')
    Update = new('update')
    Delete = new('delete')
  end
end

class TodoAction < T::Struct
  const :action_type, ActionType
  const :task, String
  const :priority, String, default: 'medium'
end

class TodoListManagement < DSPy::Signature
  description "Parse user request into structured todo actions"

  input do
    const :user_request, String, description: "Natural language request about todos"
  end

  output do
    const :actions, T::Array[TodoAction], description: "Actions to execute"
    const :summary, String, description: "Brief summary of what will be done"
  end
end

# Configure DSPy with structured outputs for optimal performance
DSPy.configure do |c|
  c.lm = DSPy::LM.new(
    'openai/gpt-4o-mini',              # Fast and cost-effective
    api_key: ENV['OPENAI_API_KEY'],
    structured_outputs: true            # 68% faster than enhanced prompting
  )
end

predictor = DSPy::Predict.new(TodoListManagement)
result = predictor.call(
  user_request: "Add task to buy groceries and schedule team meeting for Friday"
)

puts "Summary: #{result.summary}"
result.actions.each do |action|
  puts "  #{action.action_type.serialize}: #{action.task} [#{action.priority}]"
end

This example shows DSPy.rb’s core components working together:

Configuration: Set up your language model
Predictors: The DSPy::Predict class handles JSON extraction automatically

Recommendations

For OpenAI GPT-4o users: Enable structured_outputs: true for dramatic wins—23-28% faster, 41-43% cheaper, and 44-45% fewer tokens. This is a clear win with no downsides.

For OpenAI GPT-5 users: Use enhanced prompting for 29-43% faster responses and 21-28% cost savings. GPT-5’s extensive reasoning generates 1130-1447 output tokens, making structured outputs slower and more expensive.

For Anthropic users: Use enhanced prompting for 12-22% cost savings. Performance is nearly identical between strategies, but enhanced prompting uses fewer tokens.

For Google Gemini users: Enable structured_outputs: true for exceptional results:

Gemini 2.5 Flash: 49% faster, 48% cheaper, 61% fewer tokens
Gemini 2.5 Pro: 35% faster, 37% cheaper, 56% fewer tokens
Structured outputs achieve massive token efficiency through native API integration

For multi-provider applications: Enhanced prompting remains the best default strategy for universal compatibility, though provider-specific optimization can yield significant improvements.

Budget-conscious applications: Use Gemini 2.5 Flash with structured outputs ($0.000050 per extraction)—507x cheaper than Claude Opus with structured outputs.

Speed-critical applications: Use GPT-4o with structured outputs (1.769s average)—the fastest option tested.

For enterprise deployments, implement production observability to monitor extraction quality across providers.

Future: BAML-Inspired Enhanced Prompting

We’re developing sorbet-baml, a next-generation approach to Enhanced Prompting that could reduce token usage by 50-70% while improving accuracy. This initiative (GitHub #70) transforms verbose JSON schemas into TypeScript-like syntax with inline comments:

Current JSON Schema: 150 tokens BAML Format: 45 tokens (70% reduction)

Expected benefits:

Lower costs: Dramatically reduced token consumption for complex schemas
Better accuracy: Up to 20% improvement for nested structures
Universal compatibility: Works with all providers (OpenAI, Anthropic, Gemini, Ollama)

This enhancement will integrate seamlessly with DSPy.rb’s existing Enhanced Prompting strategy, providing automatic optimization without code changes.

Type-Safe Prediction Objects - Deep dive into DSPy.rb’s type system
Under the Hood: JSON Extraction - Technical details of extraction strategies
JSON Parsing Reliability - Techniques for robust JSON handling

Benchmark data: 16 tests across 2 strategies and 8 latest AI models (September 2025). Total cost: $0.0959. View benchmark source code and raw data.