Version 0.9.0 is now available. See what's new

Build LLM apps like you build software

Tired of copy-pasting prompts and hoping they work? DSPy.rb lets you write modular, type-safe Ruby code that handles the LLM stuff for you. Test it, optimize it, ship it.

Get started View examples

Why programmatic prompts?

Because prompt engineering is a nightmare. You tweak words, cross your fingers, and deploy. When it breaks in production (and it will), you're back to square one. DSPy.rb fixes this by letting you define what you want, not how to ask for it.

See it in action

Define what you need with type-safe Signatures:

class Email < T::Struct
  const :subject, String
  const :from, String
  const :to, String
  const :body, String
end

class EmailCategory < T::Enum
  enums do
    Technical = new('technical')
    Billing = new('billing')
    General = new('general')
  end
end

class Priority < T::Enum
  enums do
    Low = new('low')
    Medium = new('medium')
    High = new('high')
  end
end

class ClassifyEmail < DSPy::Signature
  description "Classify customer support emails by analyzing content and urgency"
  
  input do
    const :email, Email, 
          description: "The email to classify with all headers and content"
  end
  
  output do
    const :category, EmailCategory,
          description: "Main topic: technical (API, bugs), billing (payment, pricing), or general"
    const :priority, Priority,
          description: "Urgency level based on keywords like 'production', 'ASAP', 'urgent'"
    const :summary, String,
          description: "One-line summary of the issue for support dashboard"
  end
end

Let the LLM show its work

Use Chain of Thought for complex reasoning:

classifier = DSPy::ChainOfThought.new(ClassifyEmail)

# Create a properly typed email object
email = Email.new(
  subject: "URGENT: API Key Not Working!!!",
  from: "john.doe@acmecorp.com",
  to: "support@yourcompany.com",
  body: "My API key stopped working after the update. I need this fixed ASAP for our production deployment!"
)

classification = classifier.call(email: email)  # Type-checked at runtime!

What you get back

Proper Ruby objects, not strings:

irb> classification.reasoning
=> "Let me analyze this email step by step:
1. The customer mentions an API key issue - this is technical
2. They mention it stopped working after an update - suggests a system change
3. They emphasize 'ASAP' and 'production deployment' - this is urgent
4. Production issues always warrant high priority"

irb> classification.category
=> #<EmailCategory::Technical:0x00007f8b2c0a1b80>

irb> classification.category.class
=> EmailCategory::Technical

irb> classification.category == EmailCategory::Technical  # Type-safe comparison
=> true

irb> classification.priority
=> #<Priority::High:0x00007f8b2c0a1c20>

irb> classification.priority.serialize  # Get the string value when needed
=> "high"

irb> classification.summary
=> "API key authentication failure post-update affecting production"

# Your IDE knows these are the ONLY valid values:
irb> EmailCategory.values
=> [#<EmailCategory::Technical>, #<EmailCategory::Billing>, #<EmailCategory::General>]

# Type errors caught at runtime (or by Sorbet static analysis):
irb> classification.category = "invalid"  # This would raise an error!

That's it. No prompt templates. No "You are a helpful assistant" nonsense. Just define what you want with real Ruby types and let DSPy handle the rest. Your category field can only ever be Technical, Billing, or General - not "technicall" or "TECHNICAL" or any other typo. The descriptions you add to fields become part of the prompt, guiding the LLM without you writing prompt engineering poetry. When you need to improve accuracy, you can optimize these programmatically with real data - not guesswork.

Built for Ruby developers

Everything you love about Ruby, now for LLM applications.

Type-safe from the start

Catch errors before runtime. Your IDE knows what fields exist, what types they are, and what methods you can call. No more KeyError surprises in production.

Test like normal code

Write RSpec tests for your LLM logic. Mock responses, test edge cases, measure accuracy. Your CI/CD pipeline just works - no special tooling needed.

Optimize with data

Stop guessing what prompts work. Feed your examples to the optimizer and let it find the best instructions and few-shot examples automatically. Science, not art.

Compose and reuse

Build complex workflows from simple modules. Chain them, compose them, swap them out. Your email classifier can feed into your priority ranker. Just like regular code.

Control your prompts

Version control your LLM logic. Roll back when needed. A/B test different approaches. Know exactly what prompt is running in production. No more mystery meat.

Production ready

Built-in observability, error handling, and performance monitoring. Track token usage, response times, and accuracy. Deploy with confidence.