Complex Types
DSPy.rb provides support for structured data types beyond simple strings through integration with Sorbet’s type system. You can use enums, structs, arrays, and hashes to create well-defined interfaces for your LLM applications.
Overview
DSPy.rb supports:
- Enums: Constrained value sets with T::Enum
- Structs: Complex objects with T::Struct
- Collections: Arrays and hashes of typed elements
- Optional Fields: Nullable types with T.nilable
- JSON Schema Generation: Automatic schema creation for LLM consumption
Enum Types
Basic Enums
class Sentiment < T::Enum
enums do
Positive = new('positive')
Negative = new('negative')
Neutral = new('neutral')
end
end
class ClassifyText < DSPy::Signature
description "Classify text sentiment"
input do
const :text, String
end
output do
const :sentiment, Sentiment
const :confidence, Float
end
end
# Usage
classifier = DSPy::Predict.new(ClassifyText)
result = classifier.call(text: "I love this product!")
puts result.sentiment.serialize # => "positive"
String Enum Values
class Priority < T::Enum
enums do
Low = new('low')
Medium = new('medium')
High = new('high')
Critical = new('critical')
end
end
class TicketClassifier < DSPy::Signature
description "Classify support ticket priority"
input do
const :ticket_content, String
end
output do
const :priority, Priority
const :reasoning, String
end
end
Multiple Enum Fields
class Category < T::Enum
enums do
Technical = new('technical')
Billing = new('billing')
Account = new('account')
end
end
class Status < T::Enum
enums do
Open = new('open')
InProgress = new('in_progress')
Resolved = new('resolved')
end
end
class TicketAnalysis < DSPy::Signature
description "Analyze support ticket"
input do
const :content, String
end
output do
const :category, Category
const :priority, Priority
const :status, Status
end
end
Struct Types
Basic Structs
class ContactInfo < T::Struct
const :name, String
const :email, String
const :phone, T.nilable(String)
end
class ExtractContact < DSPy::Signature
description "Extract contact information from text"
input do
const :text, String
end
output do
const :contact, ContactInfo
const :confidence, Float
end
end
# Usage
extractor = DSPy::Predict.new(ExtractContact)
result = extractor.call(text: "John Doe - john@example.com - 555-1234")
# Access struct fields
puts result.contact.name # => "John Doe"
puts result.contact.email # => "john@example.com"
puts result.contact.phone # => "555-1234"
Nested Structs
class Address < T::Struct
const :street, String
const :city, String
const :state, String
const :zip_code, String
end
class Person < T::Struct
const :name, String
const :age, Integer
const :address, Address
end
class ExtractPersonInfo < DSPy::Signature
description "Extract detailed person information"
input do
const :text, String
end
output do
const :person, Person
end
end
Collection Types
Arrays
class ExtractKeywords < DSPy::Signature
description "Extract keywords from text"
input do
const :text, String
end
output do
const :keywords, T::Array[String]
const :count, Integer
end
end
# Usage
extractor = DSPy::Predict.new(ExtractKeywords)
result = extractor.call(text: "Machine learning and artificial intelligence...")
puts result.keywords # => ["machine learning", "artificial intelligence", ...]
Arrays of Structs
DSPy.rb supports arrays of custom TStruct types with automatic type coercion. When the LLM returns JSON arrays containing hash objects, DSPy.rb automatically converts them to the appropriate TStruct instances.
class Product < T::Struct
const :name, String
const :price, Float
const :category, String
end
class ExtractProducts < DSPy::Signature
description "Extract product information from text"
input do
const :text, String
end
output do
const :products, T::Array[Product]
const :total_found, Integer
end
end
# Usage
extractor = DSPy::Predict.new(ExtractProducts)
result = extractor.call(text: "We have iPhone 15 for $999 and Samsung Galaxy for $799...")
# DSPy automatically converts the JSON response to Product structs
result.products.each do |product|
# Each product is a proper Product struct instance
puts "#{product.name} - $#{product.price} (#{product.category})"
end
Complex Struct Arrays
You can also use more complex structs with nested types:
class Citation < T::Struct
const :title, String
const :author, String
const :year, Integer
const :relevance, Float
const :tags, T::Array[String]
end
class ResearchSynthesis < DSPy::Signature
description "Synthesize research papers on a topic"
input do
const :query, String
const :max_results, Integer
end
output do
const :citations, T::Array[Citation]
const :summary, String
const :key_findings, T::Array[String]
end
end
# The LLM returns JSON like:
# {
# "citations": [
# {"title": "...", "author": "...", "year": 2023, "relevance": 0.95, "tags": ["ML", "NLP"]},
# {"title": "...", "author": "...", "year": 2022, "relevance": 0.87, "tags": ["AI"]}
# ],
# "summary": "...",
# "key_findings": ["...", "..."]
# }
# DSPy automatically converts each citation hash to a Citation struct
synthesizer = DSPy::Predict.new(ResearchSynthesis)
result = synthesizer.call(query: "transformer architectures", max_results: 5)
result.citations.each do |citation|
# citation is a Citation struct, not a hash
puts "#{citation.title} by #{citation.author} (#{citation.year})"
puts "Relevance: #{(citation.relevance * 100).round}%"
puts "Tags: #{citation.tags.join(', ')}"
end
Hash Types
class AnalyzeMetrics < DSPy::Signature
description "Analyze text and return metrics"
input do
const :text, String
end
output do
const :metrics, T::Hash[String, Float]
const :summary, String
end
end
# Results in metrics like:
# { "readability" => 0.8, "sentiment_score" => 0.6, "complexity" => 0.4 }
Optional and Nullable Types
Optional Fields
class ProductInfo < T::Struct
const :name, String
const :price, T.nilable(Float) # Optional price
const :description, T.nilable(String) # Optional description
const :in_stock, T::Boolean
end
class ExtractProductInfo < DSPy::Signature
description "Extract product information, handling missing data"
input do
const :text, String
end
output do
const :product, ProductInfo
const :confidence, Float
end
end
# Handles cases where price or description might not be available
Complex Optional Structures
class Review < T::Struct
const :rating, Integer
const :comment, String
const :reviewer_name, T.nilable(String)
const :verified_purchase, T::Boolean
end
class ExtractReviews < DSPy::Signature
description "Extract product reviews from text"
input do
const :text, String
end
output do
const :reviews, T::Array[Review]
const :average_rating, T.nilable(Float)
end
end
Working with Complex Results
Accessing Nested Data
result = extractor.call(text: input_text)
# Access struct fields
person = result.person
puts "Name: #{person.name}"
puts "Address: #{person.address.street}, #{person.address.city}"
# Work with arrays
result.keywords.each_with_index do |keyword, i|
puts "#{i+1}. #{keyword}"
end
# Process hash results
result.metrics.each do |metric, value|
puts "#{metric}: #{value.round(2)}"
end
Validation and Error Handling
result = extractor.call(text: input_text)
# Check for nil values
if result.contact.phone
puts "Phone: #{result.contact.phone}"
else
puts "No phone number provided"
end
# Validate array contents
if result.products.any?
puts "Found #{result.products.size} products"
result.products.each do |product|
puts "- #{product.name}: $#{product.price}"
end
else
puts "No products found"
end
JSON Schema Integration
DSPy.rb automatically generates JSON schemas for your complex types:
# The signature automatically creates schemas for the LLM
signature = ClassifyText.new
schema = signature.schema
# Schema includes type constraints:
# {
# "input": {
# "text": {"type": "string"}
# },
# "output": {
# "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
# "confidence": {"type": "number"}
# }
# }
Best Practices
1. Use Descriptive Names
# Good: Clear purpose and constraints
class TaskPriority < T::Enum
enums do
Low = new('low')
Medium = new('medium')
High = new('high')
Urgent = new('urgent')
end
end
# Good: Descriptive struct fields
class CustomerFeedback < T::Struct
const :satisfaction_score, Integer
const :main_complaint, T.nilable(String)
const :would_recommend, T::Boolean
end
2. Handle Missing Data Gracefully
class ExtractCompanyInfo < DSPy::Signature
description "Extract company information, handling incomplete data"
input do
const :text, String
end
output do
const :company_name, String
const :industry, T.nilable(String)
const :employee_count, T.nilable(Integer)
const :founded_year, T.nilable(Integer)
const :confidence, Float
end
end
# Usage with error handling
result = extractor.call(text: company_description)
company_info = {
name: result.company_name,
industry: result.industry || "Unknown",
size: result.employee_count || "Not specified",
age: result.founded_year ? Date.current.year - result.founded_year : nil
}
3. Use Validation in Your Logic
def process_extraction_result(result)
# Validate required fields
return nil unless result.contact.name.present?
return nil unless result.contact.email.present?
# Process optional fields carefully
contact_info = {
name: result.contact.name,
email: result.contact.email
}
contact_info[:phone] = result.contact.phone if result.contact.phone
contact_info
end
4. Design for LLM Understanding
# Use clear, unambiguous enum values
class ResponseType < T::Enum
enums do
Positive = new('positive') # Clear
Negative = new('negative') # Clear
Neutral = new('neutral') # Clear
# Avoid: Mixed = new('mixed') # Ambiguous
end
end
# Use meaningful struct field names
class EmailClassification < T::Struct
const :is_spam, T::Boolean # Clear boolean
const :spam_confidence, Float # Clear confidence measure
const :primary_topic, String # Clear categorization
end
Limitations and Best Practices
Nesting Depth Limitations
DSPy.rb has practical limits on nested struct complexity:
✅ Recommended Nesting (1-2 levels):
class Address < T::Struct
const :street, String
const :city, String
const :state, String
end
class Person < T::Struct
const :name, String
const :address, Address # 2 levels total - works reliably
end
⚠️ Deep Nesting (3+ levels) - Use with Caution:
# This creates increasingly complex JSON schemas that may:
# - Trigger OpenAI depth validation warnings (>5 levels)
# - Have type coercion issues with deeply nested T::Struct objects
# - Reduce LLM accuracy due to schema complexity
class Level3 < T::Struct
const :level4, Level4
end
class Level2 < T::Struct
const :level3, Level3
end
class Level1 < T::Struct
const :level2, Level2 # 4+ levels - may fail
end
❌ Avoid Excessive Nesting (5+ levels):
- JSON schema generation works but creates complex schemas
- Type coercion may return Hash objects instead of proper T::Struct instances
- OpenAI structured outputs may reject schemas exceeding depth limits
- LLMs struggle with deeply nested output requirements
Performance Considerations
Schema Caching: DSPy.rb automatically caches JSON schemas for repeated use:
# First call generates schema
result1 = predictor.call(input: "text")
# Second call uses cached schema (faster)
result2 = predictor.call(input: "more text")
Provider Optimization: Different providers handle complex types differently:
- OpenAI Structured Outputs: Excellent for 1-3 level nesting
- Anthropic: Robust JSON extraction handles most complexity
- Enhanced Prompting: Fallback for any provider, handles simpler structures better
Troubleshooting Complex Types
Type Coercion Issues: If you get Hash objects instead of T::Struct instances:
# Check if the issue is with deep nesting
class SimpleStruct < T::Struct
const :field, String
end
# Test with a simple struct first
# If it works, the issue is likely nesting depth
Schema Validation: Check schema depth warnings:
schema = YourSignature.output_json_schema
issues = DSPy::LM::Adapters::OpenAI::SchemaConverter.validate_compatibility(schema)
puts issues # Shows depth and complexity warnings
Alternative Approaches: Instead of deep nesting, consider:
- Flattening complex structures
- Using separate API calls for complex data
- Breaking down into multiple simpler signatures