Tired of copy-pasting prompts and hoping they work? DSPy.rb lets you write modular, type-safe Ruby code that handles the LLM stuff for you. Test it, optimize it, ship it.
Because prompt engineering is a nightmare. You tweak words, cross your fingers, and deploy. When it breaks in production (and it will), you're back to square one. DSPy.rb fixes this by letting you define what you want, not how to ask for it.
Define what you need with type-safe Signatures:
class Email < T::Struct
const :subject, String
const :from, String
const :to, String
const :body, String
end
class EmailCategory < T::Enum
enums do
Technical = new('technical')
Billing = new('billing')
General = new('general')
end
end
class Priority < T::Enum
enums do
Low = new('low')
Medium = new('medium')
High = new('high')
end
end
class ClassifyEmail < DSPy::Signature
description "Classify customer support emails by analyzing content and urgency"
input do
const :email, Email,
description: "The email to classify with all headers and content"
end
output do
const :category, EmailCategory,
description: "Main topic: technical (API, bugs), billing (payment, pricing), or general"
const :priority, Priority,
description: "Urgency level based on keywords like 'production', 'ASAP', 'urgent'"
const :summary, String,
description: "One-line summary of the issue for support dashboard"
end
end
Use Chain of Thought for complex reasoning:
classifier = DSPy::ChainOfThought.new(ClassifyEmail)
# Create a properly typed email object
email = Email.new(
subject: "URGENT: API Key Not Working!!!",
from: "john.doe@acmecorp.com",
to: "support@yourcompany.com",
body: "My API key stopped working after the update. I need this fixed ASAP for our production deployment!"
)
classification = classifier.call(email: email) # Type-checked at runtime!
Proper Ruby objects, not strings:
irb> classification.reasoning
=> "Let me analyze this email step by step:
1. The customer mentions an API key issue - this is technical
2. They mention it stopped working after an update - suggests a system change
3. They emphasize 'ASAP' and 'production deployment' - this is urgent
4. Production issues always warrant high priority"
irb> classification.category
=> #<EmailCategory::Technical:0x00007f8b2c0a1b80>
irb> classification.category.class
=> EmailCategory::Technical
irb> classification.category == EmailCategory::Technical # Type-safe comparison
=> true
irb> classification.priority
=> #<Priority::High:0x00007f8b2c0a1c20>
irb> classification.priority.serialize # Get the string value when needed
=> "high"
irb> classification.summary
=> "API key authentication failure post-update affecting production"
# Your IDE knows these are the ONLY valid values:
irb> EmailCategory.values
=> [#<EmailCategory::Technical>, #<EmailCategory::Billing>, #<EmailCategory::General>]
# Type errors caught at runtime (or by Sorbet static analysis):
irb> classification.category = "invalid" # This would raise an error!
That's it. No prompt templates. No "You are a helpful assistant" nonsense. Just define what you want with real Ruby types and let DSPy handle the rest. Your category field can only ever be Technical, Billing, or General - not "technicall" or "TECHNICAL" or any other typo. The descriptions you add to fields become part of the prompt, guiding the LLM without you writing prompt engineering poetry. When you need to improve accuracy, you can optimize these programmatically with real data - not guesswork.
Everything you love about Ruby, now for LLM applications.
Catch errors before runtime. Your IDE knows what fields exist, what types they are, and what methods you can call. No more KeyError surprises in production.
Write RSpec tests for your LLM logic. Mock responses, test edge cases, measure accuracy. Your CI/CD pipeline just works - no special tooling needed.
Stop guessing what prompts work. Feed your examples to the optimizer and let it find the best instructions and few-shot examples automatically. Science, not art.
Build complex workflows from simple modules. Chain them, compose them, swap them out. Your email classifier can feed into your priority ranker. Just like regular code.
Version control your LLM logic. Roll back when needed. A/B test different approaches. Know exactly what prompt is running in production. No more mystery meat.
Built-in observability, error handling, and performance monitoring. Track token usage, response times, and accuracy. Deploy with confidence.
Everything you need to build reliable LLM applications
DSPy.rb brings software engineering best practices to LLM development. Type-safe interfaces, composable modules, and systematic testing.
Stop fighting with prompt engineering. Start building LLM applications that actually work in production.