Concurrent LLM Processing: Real Performance Gains with Ruby's Async Ecosystem
See how DSPy.rb achieves 3x performance improvements using Ruby's excellent async capabilities. Real measurements from a practical coffee shop agent example.
Vicente Reig
Fractional Engineering Lead • • 7 min read
Serving multiple customers efficiently isn’t just a coffee shop problem—it’s the core challenge of any LLM application at scale. When your AI agent needs to handle multiple requests, the difference between sequential and concurrent processing can make or break the user experience.
Today, we’ll look at real performance gains using DSPy.rb’s concurrent processing capabilities, built on Ruby’s excellent async ecosystem.
The Problem: Sequential Bottlenecks
Let’s start with a practical example. You’ve built an AI coffee shop agent that handles customer requests—making drinks, processing refunds, telling jokes, or escalating issues. Here’s what sequential processing looks like:
# Sequential processing - each customer waits for the previous one
agent.handle_customer(request: "Large iced latte with oat milk", mood: CustomerMood::Happy) # 5.4s
agent.handle_customer(request: "This coffee is terrible!", mood: CustomerMood::Upset) # 7.5s
agent.handle_customer(request: "Do you sell hamburgers?", mood: CustomerMood::Neutral) # 6.0s
agent.handle_customer(request: "Got any coffee jokes?", mood: CustomerMood::Happy) # 6.1s
# Total time: ~25 seconds
The fourth customer waits 19 seconds before their request even starts processing. That’s not acceptable for any real application.
The Solution: Concurrent Processing
Ruby’s async gem provides fiber-based concurrency that’s perfect for I/O-bound operations like LLM API calls. Here’s how DSPy.rb leverages it:
Async do
agent = CoffeeShopAgent.new
barrier = Async::Barrier.new
# All customer requests launch simultaneously
barrier.async { agent.handle_customer(request: "Large iced latte...", mood: CustomerMood::Happy) }
barrier.async { agent.handle_customer(request: "This coffee is terrible!", mood: CustomerMood::Upset) }
barrier.async { agent.handle_customer(request: "Do you sell hamburgers?", mood: CustomerMood::Neutral) }
barrier.async { agent.handle_customer(request: "Got any coffee jokes?", mood: CustomerMood::Happy) }
# Wait for all to complete
barrier.wait
end
# Total time: 7.53 seconds (limited by slowest request)
Real Performance Results
Running both versions with identical customer requests, here are the measured results:
- Sequential version: ~25 seconds total
- Concurrent version: 7.53 seconds total
- Performance improvement: 3.3x faster
The improvement comes from processing all requests in parallel rather than waiting for each to complete sequentially.
You can see both implementations in the DSPy.rb repository:
- Sequential version
- Concurrent version
How It Works: The Architecture
Here’s how the concurrent processing flows through DSPy.rb’s async architecture:
Key Components
- Async::Barrier: Coordinates multiple concurrent tasks and waits for all to complete
- Customer Tasks: Each request runs in its own fiber with isolated context
- LM Chat Tasks: The actual LLM API calls happen concurrently
- Background Telemetry: Observability data is processed asynchronously without blocking
Ruby’s Async Ecosystem Excellence
Ruby’s async processing capabilities have matured significantly. The async gem provides:
- Fiber-based concurrency: Lightweight, cooperative multitasking
- Structured concurrency: Clean task lifecycle management with automatic cleanup
- Non-blocking I/O: Perfect for API calls and database operations
- Zero thread overhead: Fibers are much more efficient than threads for I/O-bound work
DSPy.rb builds on this solid foundation, providing async-aware features throughout the framework:
- Concurrent predictions using
Async::Barrier
- Fiber-local context management for clean request isolation
- Background telemetry processing with automatic retry handling
- Non-blocking observability that doesn’t slow down your application
Implementing Concurrent Processing
DSPy.rb makes concurrent processing straightforward. Here’s the minimal code needed:
require 'async'
require 'async/barrier'
# Your existing DSPy module
predictor = DSPy::ChainOfThought.new(YourSignature)
# Concurrent execution
Async do
barrier = Async::Barrier.new
requests.each do |request|
barrier.async do
result = predictor.call(**request)
# Process result
end
end
barrier.wait # Wait for all to complete
end
Each prediction runs in its own fiber with isolated context. DSPy.rb’s fiber-local storage ensures that temporary configuration changes (like switching models) don’t interfere with concurrent requests.
When Concurrent Processing Helps
Concurrent processing provides the biggest benefits when:
- I/O bound operations: LLM API calls, database queries, external service calls
- Independent requests: Tasks that don’t depend on each other’s results
- Variable response times: Some requests are fast, others are slow
- Multiple data sources: Calling different LLM providers or APIs simultaneously
For the coffee shop example, concurrent processing helps because:
- Each customer request is independent
- LLM API calls have variable response times (5-7 seconds)
- The agent makes external API calls that can be parallelized
Measuring Your Own Performance
To measure concurrent processing benefits in your application:
require 'benchmark'
# Benchmark sequential processing
sequential_time = Benchmark.measure do
requests.each { |req| process_request(req) }
end
# Benchmark concurrent processing
concurrent_time = Benchmark.measure do
Async do
barrier = Async::Barrier.new
requests.each { |req| barrier.async { process_request(req) } }
barrier.wait
end
end
improvement = sequential_time.real / concurrent_time.real
puts "#{improvement.round(1)}x faster with concurrent processing"
Beyond the Coffee Shop
The principles demonstrated here apply to any LLM application:
- Content generation: Process multiple articles simultaneously
- Data analysis: Analyze multiple datasets in parallel
- Multi-step workflows: Execute independent steps concurrently
- Batch processing: Handle multiple user requests efficiently
The key is identifying I/O-bound, independent operations that can benefit from concurrent execution.
Practical Takeaways
- Ruby’s async ecosystem is production-ready - The async gem provides excellent concurrent processing capabilities
- Measure before optimizing - Use real benchmarks to validate performance improvements
- Concurrent != complex - DSPy.rb’s
Async::Barrier
makes concurrent processing straightforward - Context isolation matters - Fiber-local storage prevents concurrent requests from interfering with each other
- Start small - Begin with simple concurrent patterns and build complexity gradually
Ruby’s async capabilities, combined with DSPy.rb’s async-aware architecture, provide a solid foundation for building scalable LLM applications. The 3x performance improvement we demonstrated is achievable with minimal code changes.
The coffee shop agent example shows that concurrent processing benefits are real, measurable, and accessible—no theoretical performance claims, just practical improvements you can implement today.
Try the concurrent coffee shop example yourself: clone the DSPy.rb repository and run both versions to see the performance difference firsthand.