Article

Enhanced Prompting vs Native Structured Outputs: A DSPy.rb Comparison

Head-to-head comparison of enhanced prompting vs native structured outputs across OpenAI, Anthropic, and Google models

V

Vicente Reig

Fractional Engineering Lead β€’

Getting reliable, structured data from Large Language Models is crucial for production applications. DSPy.rb supports both enhanced prompting (universal) and native structured outputs (provider-specific). After benchmarking 8 latest models head-to-head, here’s your complete guide to choosing the right approach.

This test compares DSPy.rb’s two primary strategies: Enhanced Prompting (universal) and Native Structured Outputs (provider-specific) using the latest models from OpenAI, Anthropic, and Google as of September 2025.

Two Strategies Compared

  • Enhanced Prompting: Universal DSPy-style JSON Schema prompting with intelligent fallback handling. Works with any LLM provider.
  • Native Structured Outputs: Provider-specific structured generation APIs:
    • OpenAI: JSON Schema with strict: true enforcement
    • Anthropic: Tool use with JSON schema validation
    • Google: Gemini native structured output mode

Benchmark Results Overview

Both strategies achieved 100% success rate across all 8 models (16 tests total). Here are the head-to-head comparisons:

Provider Model Enhanced Prompting Native Structured Winner
OpenAI gpt-4o 2302ms / $0.002833 1769ms / $0.001658 πŸ† Structured (23% faster, 41% cheaper)
OpenAI gpt-4o-mini 2944ms / $0.000169 2111ms / $0.000097 πŸ† Structured (28% faster, 43% cheaper)
OpenAI gpt-5 16005ms / $0.011895 22921ms / $0.015065 πŸ† Enhanced (43% faster, 21% cheaper)
OpenAI gpt-5-mini 8303ms / $0.001361 10694ms / $0.001881 πŸ† Enhanced (29% faster, 28% cheaper)
Anthropic claude-sonnet-4-5 3411ms / $0.004581 3401ms / $0.005886 πŸ† Enhanced (similar speed, 22% cheaper)
Anthropic claude-opus-4-1 4993ms / $0.02238 4796ms / $0.025335 πŸ† Enhanced (4% slower, 12% cheaper)
Google gemini-2.5-pro 10478ms / $0.001623 6787ms / $0.001023 πŸ† Structured (35% faster, 37% cheaper)
Google gemini-2.5-flash 15704ms / $0.000096 7943ms / $0.000050 πŸ† Structured (49% faster, 48% cheaper)

Response Time Comparison by Model

Model / Strategy Response Time (seconds)
gpt-4o (Structured) 1.769s
gpt-4o-mini (Structured) 2.111s
gpt-4o (Enhanced) 2.302s
gpt-4o-mini (Enhanced) 2.944s
claude-sonnet-4-5 (Structured) 3.401s
claude-sonnet-4-5 (Enhanced) 3.411s
claude-opus-4-1 (Structured) 4.796s
claude-opus-4-1 (Enhanced) 4.993s
gemini-2.5-pro (Structured) 6.787s
gemini-2.5-flash (Structured) 7.943s
gpt-5-mini (Enhanced) 8.303s
gemini-2.5-pro (Enhanced) 10.478s
gpt-5-mini (Structured) 10.694s
gemini-2.5-flash (Enhanced) 15.704s
gpt-5 (Enhanced) 16.005s
gpt-5 (Structured) 22.921s

Based on benchmark data from September 2025. GPT-4o with structured outputs is the fastest at 1.769s, while GPT-5 with structured outputs is the slowest at 22.921s. GPT-4o models show dramatic improvements with structured outputs, while GPT-5 models perform better with enhanced prompting.

Token Consumption Analysis

Token usage varies dramatically by both model and strategy. Modern structured output implementations optimize token efficiency by sending schemas through API parameters rather than in prompts.

Token Usage by Model and Strategy

Model Enhanced Prompting Native Structured Difference
gpt-4o 477β†’164 (641 total) 255β†’102 (357 total) -222 input, -62 output (-284 total, 44% reduction)
gpt-4o-mini 477β†’163 (640 total) 255β†’98 (353 total) -222 input, -65 output (-287 total, 45% reduction)
gpt-5 476β†’1130 (1606 total) 476β†’1447 (1923 total) Same input, +317 output (+317 total, 20% increase)
gpt-5-mini 476β†’621 (1097 total) 476β†’881 (1357 total) Same input, +260 output (+260 total, 24% increase)
claude-sonnet-4-5 597β†’186 (783 total) 927β†’207 (1134 total) +330 input, +21 output (+351 total, 45% increase)
claude-opus-4-1 597β†’179 (776 total) 654β†’207 (861 total) +57 input, +28 output (+85 total, 11% increase)
gemini-2.5-pro 554β†’186 (740 total) 158β†’165 (323 total) -396 input, -21 output (-417 total, 56% reduction)
gemini-2.5-flash 554β†’180 (734 total) 158β†’127 (285 total) -396 input, -53 output (-449 total, 61% reduction)

Key Insights:

  • OpenAI (GPT-4): Structured outputs dramatically reduce token consumption (-44% to -45% total) by sending schemas via API
  • OpenAI (GPT-5): Higher output token generation (+20% to +24%) indicates extensive reasoning/thinking tokens
  • Anthropic: Structured outputs still increase tokens (+11% to +45%) due to tool-use architecture
  • Google: Structured outputs achieve massive token reduction (-56% to -61% total) through native API integration

Token Efficiency by Model Family

Model Avg Tokens (Both Strategies)
gpt-4o-mini 497 tokens
gpt-4o 499 tokens
gemini-2.5-flash 510 tokens
gemini-2.5-pro 532 tokens
claude-opus-4-1 819 tokens
claude-sonnet-4-5 959 tokens
gpt-5-mini 1227 tokens
gpt-5 1765 tokens

GPT-4o and Google models are most token-efficient (497-532 tokens average). Claude models use moderate tokens (819-959 average). GPT-5 models generate significantly more tokens (1227-1765 average) due to extensive reasoning/thinking output.

Cost Comparison: All Models and Strategies

Model / Strategy Cost per Extraction
gemini-2.5-flash (Structured) $0.000050
gemini-2.5-flash (Enhanced) $0.000096
gpt-4o-mini (Structured) $0.000097
gpt-4o-mini (Enhanced) $0.000169
gemini-2.5-pro (Structured) $0.001023
gpt-5-mini (Enhanced) $0.001361
gemini-2.5-pro (Enhanced) $0.001623
gpt-4o (Structured) $0.001658
gpt-5-mini (Structured) $0.001881
gpt-4o (Enhanced) $0.002833
claude-sonnet-4-5 (Enhanced) $0.004581
claude-sonnet-4-5 (Structured) $0.005886
gpt-5 (Enhanced) $0.011895
gpt-5 (Structured) $0.015065
claude-opus-4-1 (Enhanced) $0.02238
claude-opus-4-1 (Structured) $0.025335

Gemini 2.5 Flash with Structured Outputs delivers the lowest cost at $0.000050 per extractionβ€”507x cheaper than Claude Opus with Structured Outputs. View benchmark source.

Performance by Provider (Average across models)

Provider Enhanced Prompting Native Structured
Anthropic 4.202s 4.099s
Google 13.091s 7.365s
OpenAI 7.389s 9.374s

Anthropic shows nearly identical performance between strategies (2.5% faster with structured). Google dramatically improves with structured outputs (44% faster average). OpenAI shows mixed results due to GPT-5’s slower structured output performance offsetting GPT-4o’s improvements (21% slower average with structured).

Quick Decision Matrix

Use Case Recommended Strategy Model Cost Speed
Cost-Optimized Native Structured gemini-2.5-flash $0.000050 7.943s
Speed-Optimized Native Structured gpt-4o $0.001658 1.769s
OpenAI GPT-4o Users Native Structured gpt-4o / gpt-4o-mini $0.000097-$0.001658 1.769-2.111s
OpenAI GPT-5 Users Enhanced Prompting gpt-5 / gpt-5-mini $0.001361-$0.011895 8.303-16.005s
Anthropic Users Enhanced Prompting claude-sonnet-4-5 $0.004581 3.411s
Google Users Native Structured gemini-2.5-pro / flash $0.000050-$0.001023 6.787-7.943s
Multi-Provider Enhanced Prompting Varies Varies Varies

Key Findings

  • GPT-4o Dominates: Structured outputs are 23-28% faster and 41-43% cheaper with superior token efficiency (-44% to -45%)
  • GPT-5 Reasoning Overhead: Enhanced prompting 29-43% faster; GPT-5 generates 1130-1447 output tokens (extensive reasoning)
  • Google Wins Both Ways: Structured outputs 35-49% faster, 37-48% cheaper, and 56-61% fewer tokens
  • Anthropic Prefers Enhanced: Enhanced prompting similar speed but 12-22% cheaper than structured outputs
  • Cost Champion: Gemini 2.5 Flash with structured outputs at $0.000050 per extraction
  • Speed Champion: GPT-4o with structured outputs at 1.769s
  • Token Efficiency Revolution: Structured outputs now MORE efficient for OpenAI and Google (vs old implementations)
  • Universal Reliability: 100% success rate across all 16 tests (8 models Γ— 2 strategies)

Implementation

DSPy.rb uses Signatures to define structured inputs and outputs. Here’s an example using T::Enum types:

class ActionType < T::Enum
  enums do
    Create = new('create')
    Update = new('update')
    Delete = new('delete')
  end
end

class TodoAction < T::Struct
  const :action_type, ActionType
  const :task, String
  const :priority, String, default: 'medium'
end

class TodoListManagement < DSPy::Signature
  description "Parse user request into structured todo actions"

  input do
    const :user_request, String, description: "Natural language request about todos"
  end

  output do
    const :actions, T::Array[TodoAction], description: "Actions to execute"
    const :summary, String, description: "Brief summary of what will be done"
  end
end

# Configure DSPy with structured outputs for optimal performance
DSPy.configure do |c|
  c.lm = DSPy::LM.new(
    'openai/gpt-4o-mini',              # Fast and cost-effective
    api_key: ENV['OPENAI_API_KEY'],
    structured_outputs: true            # 68% faster than enhanced prompting
  )
end

predictor = DSPy::Predict.new(TodoListManagement)
result = predictor.call(
  user_request: "Add task to buy groceries and schedule team meeting for Friday"
)

puts "Summary: #{result.summary}"
result.actions.each do |action|
  puts "  #{action.action_type.serialize}: #{action.task} [#{action.priority}]"
end

This example shows DSPy.rb’s core components working together:

  • Configuration: Set up your language model
  • Predictors: The DSPy::Predict class handles JSON extraction automatically

Recommendations

For OpenAI GPT-4o users: Enable structured_outputs: true for dramatic winsβ€”23-28% faster, 41-43% cheaper, and 44-45% fewer tokens. This is a clear win with no downsides.

For OpenAI GPT-5 users: Use enhanced prompting for 29-43% faster responses and 21-28% cost savings. GPT-5’s extensive reasoning generates 1130-1447 output tokens, making structured outputs slower and more expensive.

For Anthropic users: Use enhanced prompting for 12-22% cost savings. Performance is nearly identical between strategies, but enhanced prompting uses fewer tokens.

For Google Gemini users: Enable structured_outputs: true for exceptional results:

  • Gemini 2.5 Flash: 49% faster, 48% cheaper, 61% fewer tokens
  • Gemini 2.5 Pro: 35% faster, 37% cheaper, 56% fewer tokens
  • Structured outputs achieve massive token efficiency through native API integration

For multi-provider applications: Enhanced prompting remains the best default strategy for universal compatibility, though provider-specific optimization can yield significant improvements.

Budget-conscious applications: Use Gemini 2.5 Flash with structured outputs ($0.000050 per extraction)β€”507x cheaper than Claude Opus with structured outputs.

Speed-critical applications: Use GPT-4o with structured outputs (1.769s average)β€”the fastest option tested.

For enterprise deployments, implement production observability to monitor extraction quality across providers.

Future: BAML-Inspired Enhanced Prompting

We’re developing sorbet-baml, a next-generation approach to Enhanced Prompting that could reduce token usage by 50-70% while improving accuracy. This initiative (GitHub #70) transforms verbose JSON schemas into TypeScript-like syntax with inline comments:

Current JSON Schema: 150 tokens BAML Format: 45 tokens (70% reduction)

Expected benefits:

  • Lower costs: Dramatically reduced token consumption for complex schemas
  • Better accuracy: Up to 20% improvement for nested structures
  • Universal compatibility: Works with all providers (OpenAI, Anthropic, Gemini, Ollama)

This enhancement will integrate seamlessly with DSPy.rb’s existing Enhanced Prompting strategy, providing automatic optimization without code changes.


Benchmark data: 16 tests across 2 strategies and 8 latest AI models (September 2025). Total cost: $0.0959. View benchmark source code and raw data.