Turbo & RubyLLM: Build a Streaming AI Chat in Rails
Building AI chat into a Rails app used to mean juggling provider SDKs, managing WebSocket connections, and writing a lot of JavaScript. With RubyLLM and Turbo Streams, you can build a ChatGPT-style streaming interface in about 30 minutes.
RubyLLM gives you one clean Ruby API for OpenAI, Anthropic, Gemini, and more. Combined with Turbo's real-time updates, you get streaming responses without writing any JavaScript. Here's how to build it.
What We're Building
A chat interface where:
- User submits a message
- Message appears instantly in the UI
- AI response streams in word-by-word, just like ChatGPT
- Entire conversation is persisted to the database
- Works with any provider: OpenAI, Anthropic, Google, etc.
No polling. No JavaScript fetch calls. Just Turbo Streams over Action Cable.
Setup
Add RubyLLM to your Gemfile:
gem 'ruby_llm'
Run the installer:
bundle install rails generate ruby_llm:install rails db:migrate
This creates Chat, Message, ToolCall, and Model tables with the appropriate
acts_as declarations.Configure your API keys:
# config/initializers/ruby_llm.rb RubyLLM.configure do |config| config.openai_api_key = ENV['OPENAI_API_KEY'] config.anthropic_api_key = ENV['ANTHROPIC_API_KEY'] config.gemini_api_key = ENV['GEMINI_API_KEY'] config.use_new_acts_as = true end
You only need to configure the providers you're using.
The Models
The generator creates these, but here's what they look like:
# app/models/chat.rb
class Chat < ApplicationRecord
acts_as_chat
belongs_to :user, optional: true
# Broadcast changes to subscribers
broadcasts_to ->(chat) { [chat, "messages"] }
end
# app/models/message.rb
class Message < ApplicationRecord
acts_as_message
# Broadcast when created (for user messages and final AI messages)
broadcasts_to ->(message) { [message.chat, "messages"] }
# Helper to stream chunks during AI response
def broadcast_chunk(content)
broadcast_append_to [chat, "messages"],
target: dom_id(self, "content"),
html: content
end
end
The
acts_as_chat macro gives your Chat model the ask method that handles streaming, persistence, and provider communication.The Controller
# app/controllers/chats_controller.rb
class ChatsController < ApplicationController
def index
@chats = current_user.chats.order(created_at: :desc)
end
def show
@chat = current_user.chats.find(params[:id])
end
def create
@chat = current_user.chats.create!(model: params[:model] || 'gpt-4o-mini')
redirect_to @chat
end
end
# app/controllers/messages_controller.rb
class MessagesController < ApplicationController
def create
@chat = current_user.chats.find(params[:chat_id])
# Create the user message immediately (shows in UI via broadcast)
@message = @chat.messages.create!(
role: :user,
content: params[:content]
)
# Process AI response in background
ChatResponseJob.perform_later(@chat.id)
head :ok
end
end
The user message is created synchronously—it broadcasts to the UI immediately via
broadcasts_to. The AI response happens in a background job so we don't block the request.The Background Job
This is where the streaming magic happens:
# app/jobs/chat_response_job.rb
class ChatResponseJob < ApplicationJob
queue_as :default
def perform(chat_id)
chat = Chat.find(chat_id)
# Get the last user message
user_message = chat.messages.where(role: :user).last
return unless user_message
# The ask method with a block enables streaming
# It automatically:
# 1. Creates an assistant message record
# 2. Streams chunks as they arrive
# 3. Updates the message with final content when done
chat.ask(user_message.content) do |chunk|
next unless chunk.content.present?
# Get the assistant message that was just created
assistant_message = chat.messages.where(role: :assistant).last
# Broadcast each chunk to the UI
assistant_message.broadcast_chunk(chunk.content)
end
end
end
When you call
chat.ask with a block, RubyLLM:- Creates an empty assistant Message record
- Yields each chunk as it arrives from the provider
- Updates the Message with final content and token counts when done
We broadcast each chunk to Turbo Streams, which appends it to the message div in real-time.
The Views
Chat show page:
<!-- app/views/chats/show.html.erb -->
<%= turbo_stream_from @chat, "messages" %>
<div class="chat-container">
<div id="messages" class="messages">
<%= render @chat.messages %>
</div>
<%= render "messages/form", chat: @chat %>
</div>
Message partial:
<!-- app/views/messages/_message.html.erb -->
<%= turbo_frame_tag message do %>
<div class="message message-<%= message.role %>">
<div class="message-role">
<%= message.role == "user" ? "You" : "Assistant" %>
</div>
<div id="<%= dom_id(message, "content") %>" class="message-content">
<%= message.content.present? ? simple_format(message.content) : " " %>
</div>
</div>
<% end %>
The
id="<%= dom_id(message, "content") %>" is the target for our streaming chunks. When the job broadcasts, chunks append here.Message form:
<!-- app/views/messages/_form.html.erb -->
<%= form_with url: chat_messages_path(chat),
method: :post,
data: { controller: "message-form", action: "turbo:submit-end->message-form#reset" } do |f| %>
<div class="message-input">
<%= f.text_area :content,
placeholder: "Type a message...",
rows: 1,
data: { message_form_target: "input" } %>
<%= f.submit "Send" %>
</div>
<% end %>
Simple Stimulus controller to clear the form:
// app/javascript/controllers/message_form_controller.js
import { Controller } from "@hotwired/stimulus"
export default class extends Controller {
static targets = ["input"]
reset() {
this.inputTarget.value = ""
}
}
Routes
# config/routes.rb resources :chats, only: [:index, :show, :create] do resources :messages, only: [:create] end
Basic Styling
.chat-container {
max-width: 800px;
margin: 0 auto;
height: 100vh;
display: flex;
flex-direction: column;
}
.messages {
flex: 1;
overflow-y: auto;
padding: 1rem;
}
.message {
margin-bottom: 1rem;
padding: 1rem;
border-radius: 8px;
}
.message-user {
background: #e3f2fd;
margin-left: 2rem;
}
.message-assistant {
background: #f5f5f5;
margin-right: 2rem;
}
.message-role {
font-weight: bold;
font-size: 0.875rem;
margin-bottom: 0.5rem;
}
.message-content {
white-space: pre-wrap;
}
.message-input {
display: flex;
gap: 0.5rem;
padding: 1rem;
border-top: 1px solid #ddd;
}
.message-input textarea {
flex: 1;
padding: 0.75rem;
border: 1px solid #ddd;
border-radius: 4px;
resize: none;
}
.message-input button {
padding: 0.75rem 1.5rem;
background: #1976d2;
color: white;
border: none;
border-radius: 4px;
cursor: pointer;
}
Switching Models
One of RubyLLM's best features: same API for every provider.
# Use GPT-4
chat = Chat.create!(model: 'gpt-4o')
# Use Claude
chat = Chat.create!(model: 'claude-sonnet-4')
# Use Gemini
chat = Chat.create!(model: 'gemini-2.0-flash')
# Switch mid-conversation
chat.with_model('claude-sonnet-4')
chat.ask("Continue our discussion...")
The streaming, persistence, and broadcasting all work the same regardless of provider.
Adding System Prompts
class Chat < ApplicationRecord
acts_as_chat
after_create :set_system_prompt
private
def set_system_prompt
with_instructions(
"You are a helpful assistant. Be concise and friendly."
)
end
end
Or set it dynamically:
# In controller
@chat = current_user.chats.create!(model: 'gpt-4o-mini')
@chat.with_instructions("You are a Ruby expert. Help the user with Rails questions.")
Handling Errors
Wrap the streaming in error handling:
class ChatResponseJob < ApplicationJob
queue_as :default
def perform(chat_id)
chat = Chat.find(chat_id)
user_message = chat.messages.where(role: :user).last
return unless user_message
begin
chat.ask(user_message.content) do |chunk|
next unless chunk.content.present?
assistant_message = chat.messages.where(role: :assistant).last
assistant_message.broadcast_chunk(chunk.content)
end
rescue RubyLLM::RateLimitError
broadcast_error(chat, "Rate limited. Please wait a moment and try again.")
rescue RubyLLM::UnauthorizedError
broadcast_error(chat, "API key issue. Please check configuration.")
rescue RubyLLM::Error => e
broadcast_error(chat, "Something went wrong: #{e.message}")
end
end
private
def broadcast_error(chat, message)
Turbo::StreamsChannel.broadcast_append_to(
[chat, "messages"],
target: "messages",
partial: "messages/error",
locals: { message: message }
)
end
end
File Attachments
RubyLLM supports sending images, PDFs, and audio to models that support them:
# In your controller
def create
@chat = current_user.chats.find(params[:chat_id])
@message = @chat.messages.create!(
role: :user,
content: params[:content]
)
# Pass file path to the job
ChatResponseJob.perform_later(
@chat.id,
params[:file]&.path
)
end
# In the job
def perform(chat_id, file_path = nil)
chat = Chat.find(chat_id)
user_message = chat.messages.where(role: :user).last
if file_path
chat.ask(user_message.content, with: file_path) do |chunk|
# ... streaming logic
end
else
chat.ask(user_message.content) do |chunk|
# ... streaming logic
end
end
end
The Generator Shortcut
If you want a complete working UI out of the box:
rails generate ruby_llm:chat_ui
This generates controllers, views, jobs, and routes for a full chat interface. Visit
/chats and start chatting.Message Ordering
Action Cable processes messages concurrently, which can cause out-of-order delivery during fast streaming. Two solutions:
1. Client-side reordering (simpler):
// Stimulus controller to reorder messages by timestamp
import { Controller } from "@hotwired/stimulus"
export default class extends Controller {
static targets = ["message"]
connect() {
this.reorder()
new MutationObserver(() => this.reorder())
.observe(this.element, { childList: true })
}
reorder() {
const messages = Array.from(this.messageTargets)
messages.sort((a, b) =>
new Date(a.dataset.createdAt) - new Date(b.dataset.createdAt)
)
messages.forEach(m => this.element.appendChild(m))
}
}
2. Use AnyCable: Provides server-side ordering guarantees through sticky concurrency.
Production Considerations
1. Use Sidekiq or another production queue:
# config/application.rb config.active_job.queue_adapter = :sidekiq
2. Configure Redis for Action Cable:
# config/cable.yml
production:
adapter: redis
url: <%= ENV.fetch("REDIS_URL") %>
3. Set reasonable timeouts:
# config/initializers/ruby_llm.rb RubyLLM.configure do |config| config.request_timeout = 120 end
The Result
You now have a ChatGPT-style interface where:
- Messages appear instantly when sent
- AI responses stream in real-time, word by word
- Everything persists to the database automatically
- Switch between GPT, Claude, Gemini with one line
- No JavaScript API calls—just Ruby and Turbo
RubyLLM handles the provider complexity. Turbo handles the real-time UI. You handle the business logic. That's how building AI features in Rails should feel.