Enhanced Prompting vs Native Structured Outputs: A DSPy.rb Comparison
Head-to-head comparison of enhanced prompting vs native structured outputs across OpenAI, Anthropic, and Google models
Vicente Reig
Fractional Engineering Lead β’
Getting reliable, structured data from Large Language Models is crucial for production applications. DSPy.rb supports both enhanced prompting (universal) and native structured outputs (provider-specific). After benchmarking 8 latest models head-to-head, hereβs your complete guide to choosing the right approach.
This test compares DSPy.rbβs two primary strategies: Enhanced Prompting (universal) and Native Structured Outputs (provider-specific) using the latest models from OpenAI, Anthropic, and Google as of September 2025.
Two Strategies Compared
- Enhanced Prompting: Universal DSPy-style JSON Schema prompting with intelligent fallback handling. Works with any LLM provider.
- Native Structured Outputs: Provider-specific structured generation APIs:
- OpenAI: JSON Schema with
strict: trueenforcement - Anthropic: Tool use with JSON schema validation
- Google: Gemini native structured output mode
- OpenAI: JSON Schema with
Benchmark Results Overview
Both strategies achieved 100% success rate across all 8 models (16 tests total). Here are the head-to-head comparisons:
| Provider | Model | Enhanced Prompting | Native Structured | Winner |
|---|---|---|---|---|
| OpenAI | gpt-4o | 2302ms / $0.002833 | 1769ms / $0.001658 | π Structured (23% faster, 41% cheaper) |
| OpenAI | gpt-4o-mini | 2944ms / $0.000169 | 2111ms / $0.000097 | π Structured (28% faster, 43% cheaper) |
| OpenAI | gpt-5 | 16005ms / $0.011895 | 22921ms / $0.015065 | π Enhanced (43% faster, 21% cheaper) |
| OpenAI | gpt-5-mini | 8303ms / $0.001361 | 10694ms / $0.001881 | π Enhanced (29% faster, 28% cheaper) |
| Anthropic | claude-sonnet-4-5 | 3411ms / $0.004581 | 3401ms / $0.005886 | π Enhanced (similar speed, 22% cheaper) |
| Anthropic | claude-opus-4-1 | 4993ms / $0.02238 | 4796ms / $0.025335 | π Enhanced (4% slower, 12% cheaper) |
| gemini-2.5-pro | 10478ms / $0.001623 | 6787ms / $0.001023 | π Structured (35% faster, 37% cheaper) | |
| gemini-2.5-flash | 15704ms / $0.000096 | 7943ms / $0.000050 | π Structured (49% faster, 48% cheaper) |
Response Time Comparison by Model
| Model / Strategy | Response Time (seconds) |
|---|---|
| gpt-4o (Structured) | 1.769s |
| gpt-4o-mini (Structured) | 2.111s |
| gpt-4o (Enhanced) | 2.302s |
| gpt-4o-mini (Enhanced) | 2.944s |
| claude-sonnet-4-5 (Structured) | 3.401s |
| claude-sonnet-4-5 (Enhanced) | 3.411s |
| claude-opus-4-1 (Structured) | 4.796s |
| claude-opus-4-1 (Enhanced) | 4.993s |
| gemini-2.5-pro (Structured) | 6.787s |
| gemini-2.5-flash (Structured) | 7.943s |
| gpt-5-mini (Enhanced) | 8.303s |
| gemini-2.5-pro (Enhanced) | 10.478s |
| gpt-5-mini (Structured) | 10.694s |
| gemini-2.5-flash (Enhanced) | 15.704s |
| gpt-5 (Enhanced) | 16.005s |
| gpt-5 (Structured) | 22.921s |
Based on benchmark data from September 2025. GPT-4o with structured outputs is the fastest at 1.769s, while GPT-5 with structured outputs is the slowest at 22.921s. GPT-4o models show dramatic improvements with structured outputs, while GPT-5 models perform better with enhanced prompting.
Token Consumption Analysis
Token usage varies dramatically by both model and strategy. Modern structured output implementations optimize token efficiency by sending schemas through API parameters rather than in prompts.
Token Usage by Model and Strategy
| Model | Enhanced Prompting | Native Structured | Difference |
|---|---|---|---|
| gpt-4o | 477β164 (641 total) | 255β102 (357 total) | -222 input, -62 output (-284 total, 44% reduction) |
| gpt-4o-mini | 477β163 (640 total) | 255β98 (353 total) | -222 input, -65 output (-287 total, 45% reduction) |
| gpt-5 | 476β1130 (1606 total) | 476β1447 (1923 total) | Same input, +317 output (+317 total, 20% increase) |
| gpt-5-mini | 476β621 (1097 total) | 476β881 (1357 total) | Same input, +260 output (+260 total, 24% increase) |
| claude-sonnet-4-5 | 597β186 (783 total) | 927β207 (1134 total) | +330 input, +21 output (+351 total, 45% increase) |
| claude-opus-4-1 | 597β179 (776 total) | 654β207 (861 total) | +57 input, +28 output (+85 total, 11% increase) |
| gemini-2.5-pro | 554β186 (740 total) | 158β165 (323 total) | -396 input, -21 output (-417 total, 56% reduction) |
| gemini-2.5-flash | 554β180 (734 total) | 158β127 (285 total) | -396 input, -53 output (-449 total, 61% reduction) |
Key Insights:
- OpenAI (GPT-4): Structured outputs dramatically reduce token consumption (-44% to -45% total) by sending schemas via API
- OpenAI (GPT-5): Higher output token generation (+20% to +24%) indicates extensive reasoning/thinking tokens
- Anthropic: Structured outputs still increase tokens (+11% to +45%) due to tool-use architecture
- Google: Structured outputs achieve massive token reduction (-56% to -61% total) through native API integration
Token Efficiency by Model Family
| Model | Avg Tokens (Both Strategies) |
|---|---|
| gpt-4o-mini | 497 tokens |
| gpt-4o | 499 tokens |
| gemini-2.5-flash | 510 tokens |
| gemini-2.5-pro | 532 tokens |
| claude-opus-4-1 | 819 tokens |
| claude-sonnet-4-5 | 959 tokens |
| gpt-5-mini | 1227 tokens |
| gpt-5 | 1765 tokens |
GPT-4o and Google models are most token-efficient (497-532 tokens average). Claude models use moderate tokens (819-959 average). GPT-5 models generate significantly more tokens (1227-1765 average) due to extensive reasoning/thinking output.
Cost Comparison: All Models and Strategies
| Model / Strategy | Cost per Extraction |
|---|---|
| gemini-2.5-flash (Structured) | $0.000050 |
| gemini-2.5-flash (Enhanced) | $0.000096 |
| gpt-4o-mini (Structured) | $0.000097 |
| gpt-4o-mini (Enhanced) | $0.000169 |
| gemini-2.5-pro (Structured) | $0.001023 |
| gpt-5-mini (Enhanced) | $0.001361 |
| gemini-2.5-pro (Enhanced) | $0.001623 |
| gpt-4o (Structured) | $0.001658 |
| gpt-5-mini (Structured) | $0.001881 |
| gpt-4o (Enhanced) | $0.002833 |
| claude-sonnet-4-5 (Enhanced) | $0.004581 |
| claude-sonnet-4-5 (Structured) | $0.005886 |
| gpt-5 (Enhanced) | $0.011895 |
| gpt-5 (Structured) | $0.015065 |
| claude-opus-4-1 (Enhanced) | $0.02238 |
| claude-opus-4-1 (Structured) | $0.025335 |
Gemini 2.5 Flash with Structured Outputs delivers the lowest cost at $0.000050 per extractionβ507x cheaper than Claude Opus with Structured Outputs. View benchmark source.
Performance by Provider (Average across models)
| Provider | Enhanced Prompting | Native Structured |
|---|---|---|
| Anthropic | 4.202s | 4.099s |
| 13.091s | 7.365s | |
| OpenAI | 7.389s | 9.374s |
Anthropic shows nearly identical performance between strategies (2.5% faster with structured). Google dramatically improves with structured outputs (44% faster average). OpenAI shows mixed results due to GPT-5βs slower structured output performance offsetting GPT-4oβs improvements (21% slower average with structured).
Quick Decision Matrix
| Use Case | Recommended Strategy | Model | Cost | Speed |
|---|---|---|---|---|
| Cost-Optimized | Native Structured | gemini-2.5-flash | $0.000050 | 7.943s |
| Speed-Optimized | Native Structured | gpt-4o | $0.001658 | 1.769s |
| OpenAI GPT-4o Users | Native Structured | gpt-4o / gpt-4o-mini | $0.000097-$0.001658 | 1.769-2.111s |
| OpenAI GPT-5 Users | Enhanced Prompting | gpt-5 / gpt-5-mini | $0.001361-$0.011895 | 8.303-16.005s |
| Anthropic Users | Enhanced Prompting | claude-sonnet-4-5 | $0.004581 | 3.411s |
| Google Users | Native Structured | gemini-2.5-pro / flash | $0.000050-$0.001023 | 6.787-7.943s |
| Multi-Provider | Enhanced Prompting | Varies | Varies | Varies |
Key Findings
- GPT-4o Dominates: Structured outputs are 23-28% faster and 41-43% cheaper with superior token efficiency (-44% to -45%)
- GPT-5 Reasoning Overhead: Enhanced prompting 29-43% faster; GPT-5 generates 1130-1447 output tokens (extensive reasoning)
- Google Wins Both Ways: Structured outputs 35-49% faster, 37-48% cheaper, and 56-61% fewer tokens
- Anthropic Prefers Enhanced: Enhanced prompting similar speed but 12-22% cheaper than structured outputs
- Cost Champion: Gemini 2.5 Flash with structured outputs at $0.000050 per extraction
- Speed Champion: GPT-4o with structured outputs at 1.769s
- Token Efficiency Revolution: Structured outputs now MORE efficient for OpenAI and Google (vs old implementations)
- Universal Reliability: 100% success rate across all 16 tests (8 models Γ 2 strategies)
Implementation
DSPy.rb uses Signatures to define structured inputs and outputs. Hereβs an example using T::Enum types:
class ActionType < T::Enum
enums do
Create = new('create')
Update = new('update')
Delete = new('delete')
end
end
class TodoAction < T::Struct
const :action_type, ActionType
const :task, String
const :priority, String, default: 'medium'
end
class TodoListManagement < DSPy::Signature
description "Parse user request into structured todo actions"
input do
const :user_request, String, description: "Natural language request about todos"
end
output do
const :actions, T::Array[TodoAction], description: "Actions to execute"
const :summary, String, description: "Brief summary of what will be done"
end
end
# Configure DSPy with structured outputs for optimal performance
DSPy.configure do |c|
c.lm = DSPy::LM.new(
'openai/gpt-4o-mini', # Fast and cost-effective
api_key: ENV['OPENAI_API_KEY'],
structured_outputs: true # 68% faster than enhanced prompting
)
end
predictor = DSPy::Predict.new(TodoListManagement)
result = predictor.call(
user_request: "Add task to buy groceries and schedule team meeting for Friday"
)
puts "Summary: #{result.summary}"
result.actions.each do |action|
puts " #{action.action_type.serialize}: #{action.task} [#{action.priority}]"
end
This example shows DSPy.rbβs core components working together:
- Configuration: Set up your language model
- Predictors: The
DSPy::Predictclass handles JSON extraction automatically
Recommendations
For OpenAI GPT-4o users: Enable structured_outputs: true for dramatic winsβ23-28% faster, 41-43% cheaper, and 44-45% fewer tokens. This is a clear win with no downsides.
For OpenAI GPT-5 users: Use enhanced prompting for 29-43% faster responses and 21-28% cost savings. GPT-5βs extensive reasoning generates 1130-1447 output tokens, making structured outputs slower and more expensive.
For Anthropic users: Use enhanced prompting for 12-22% cost savings. Performance is nearly identical between strategies, but enhanced prompting uses fewer tokens.
For Google Gemini users: Enable structured_outputs: true for exceptional results:
- Gemini 2.5 Flash: 49% faster, 48% cheaper, 61% fewer tokens
- Gemini 2.5 Pro: 35% faster, 37% cheaper, 56% fewer tokens
- Structured outputs achieve massive token efficiency through native API integration
For multi-provider applications: Enhanced prompting remains the best default strategy for universal compatibility, though provider-specific optimization can yield significant improvements.
Budget-conscious applications: Use Gemini 2.5 Flash with structured outputs ($0.000050 per extraction)β507x cheaper than Claude Opus with structured outputs.
Speed-critical applications: Use GPT-4o with structured outputs (1.769s average)βthe fastest option tested.
For enterprise deployments, implement production observability to monitor extraction quality across providers.
Future: BAML-Inspired Enhanced Prompting
Weβre developing sorbet-baml, a next-generation approach to Enhanced Prompting that could reduce token usage by 50-70% while improving accuracy. This initiative (GitHub #70) transforms verbose JSON schemas into TypeScript-like syntax with inline comments:
Current JSON Schema: 150 tokens BAML Format: 45 tokens (70% reduction)
Expected benefits:
- Lower costs: Dramatically reduced token consumption for complex schemas
- Better accuracy: Up to 20% improvement for nested structures
- Universal compatibility: Works with all providers (OpenAI, Anthropic, Gemini, Ollama)
This enhancement will integrate seamlessly with DSPy.rbβs existing Enhanced Prompting strategy, providing automatic optimization without code changes.
Related Articles
- Type-Safe Prediction Objects - Deep dive into DSPy.rbβs type system
- Under the Hood: JSON Extraction - Technical details of extraction strategies
- JSON Parsing Reliability - Techniques for robust JSON handling
Benchmark data: 16 tests across 2 strategies and 8 latest AI models (September 2025). Total cost: $0.0959. View benchmark source code and raw data.