Turbo & RubyLLM: Build a Streaming AI Chat in Rails

Building AI chat into a Rails app used to mean juggling provider SDKs, managing WebSocket connections, and writing a lot of JavaScript. With RubyLLM and Turbo Streams, you can build a ChatGPT-style streaming interface in about 30 minutes.
RubyLLM gives you one clean Ruby API for OpenAI, Anthropic, Gemini, and more. Combined with Turbo's real-time updates, you get streaming responses without writing any JavaScript. Here's how to build it.

What We're Building

A chat interface where:
  • User submits a message
  • Message appears instantly in the UI
  • AI response streams in word-by-word, just like ChatGPT
  • Entire conversation is persisted to the database
  • Works with any provider: OpenAI, Anthropic, Google, etc.
No polling. No JavaScript fetch calls. Just Turbo Streams over Action Cable.

Setup

Add RubyLLM to your Gemfile:
gem 'ruby_llm'
Run the installer:
bundle install
rails generate ruby_llm:install
rails db:migrate
This creates Chat, Message, ToolCall, and Model tables with the appropriate acts_as declarations.
Configure your API keys:
# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
  config.gemini_api_key = ENV['GEMINI_API_KEY']
  config.use_new_acts_as = true
end
You only need to configure the providers you're using.

The Models

The generator creates these, but here's what they look like:
# app/models/chat.rb
class Chat < ApplicationRecord
  acts_as_chat
  
  belongs_to :user, optional: true
  
  # Broadcast changes to subscribers
  broadcasts_to ->(chat) { [chat, "messages"] }
end

# app/models/message.rb
class Message < ApplicationRecord
  acts_as_message
  
  # Broadcast when created (for user messages and final AI messages)
  broadcasts_to ->(message) { [message.chat, "messages"] }
  
  # Helper to stream chunks during AI response
  def broadcast_chunk(content)
    broadcast_append_to [chat, "messages"],
      target: dom_id(self, "content"),
      html: content
  end
end
The acts_as_chat macro gives your Chat model the ask method that handles streaming, persistence, and provider communication.

The Controller

# app/controllers/chats_controller.rb
class ChatsController < ApplicationController
  def index
    @chats = current_user.chats.order(created_at: :desc)
  end
  
  def show
    @chat = current_user.chats.find(params[:id])
  end
  
  def create
    @chat = current_user.chats.create!(model: params[:model] || 'gpt-4o-mini')
    redirect_to @chat
  end
end

# app/controllers/messages_controller.rb
class MessagesController < ApplicationController
  def create
    @chat = current_user.chats.find(params[:chat_id])
    
    # Create the user message immediately (shows in UI via broadcast)
    @message = @chat.messages.create!(
      role: :user,
      content: params[:content]
    )
    
    # Process AI response in background
    ChatResponseJob.perform_later(@chat.id)
    
    head :ok
  end
end
The user message is created synchronously—it broadcasts to the UI immediately via broadcasts_to. The AI response happens in a background job so we don't block the request.

The Background Job

This is where the streaming magic happens:
# app/jobs/chat_response_job.rb
class ChatResponseJob < ApplicationJob
  queue_as :default
  
  def perform(chat_id)
    chat = Chat.find(chat_id)
    
    # Get the last user message
    user_message = chat.messages.where(role: :user).last
    return unless user_message
    
    # The ask method with a block enables streaming
    # It automatically:
    # 1. Creates an assistant message record
    # 2. Streams chunks as they arrive
    # 3. Updates the message with final content when done
    chat.ask(user_message.content) do |chunk|
      next unless chunk.content.present?
      
      # Get the assistant message that was just created
      assistant_message = chat.messages.where(role: :assistant).last
      
      # Broadcast each chunk to the UI
      assistant_message.broadcast_chunk(chunk.content)
    end
  end
end
When you call chat.ask with a block, RubyLLM:
  1. Creates an empty assistant Message record
  2. Yields each chunk as it arrives from the provider
  3. Updates the Message with final content and token counts when done
We broadcast each chunk to Turbo Streams, which appends it to the message div in real-time.

The Views

Chat show page:
<!-- app/views/chats/show.html.erb -->

<%= turbo_stream_from @chat, "messages" %>

<div class="chat-container">
  <div id="messages" class="messages">
    <%= render @chat.messages %>
  </div>
  
  <%= render "messages/form", chat: @chat %>
</div>
Message partial:
<!-- app/views/messages/_message.html.erb -->

<%= turbo_frame_tag message do %>
  <div class="message message-<%= message.role %>">
    <div class="message-role">
      <%= message.role == "user" ? "You" : "Assistant" %>
    </div>
    <div id="<%= dom_id(message, "content") %>" class="message-content">
      <%= message.content.present? ? simple_format(message.content) : "&nbsp;" %>
    </div>
  </div>
<% end %>
The id="<%= dom_id(message, "content") %>" is the target for our streaming chunks. When the job broadcasts, chunks append here.
Message form:
<!-- app/views/messages/_form.html.erb -->

<%= form_with url: chat_messages_path(chat), 
              method: :post,
              data: { controller: "message-form", action: "turbo:submit-end->message-form#reset" } do |f| %>
  <div class="message-input">
    <%= f.text_area :content, 
                    placeholder: "Type a message...",
                    rows: 1,
                    data: { message_form_target: "input" } %>
    <%= f.submit "Send" %>
  </div>
<% end %>
Simple Stimulus controller to clear the form:
// app/javascript/controllers/message_form_controller.js
import { Controller } from "@hotwired/stimulus"

export default class extends Controller {
  static targets = ["input"]
  
  reset() {
    this.inputTarget.value = ""
  }
}

Routes

# config/routes.rb
resources :chats, only: [:index, :show, :create] do
  resources :messages, only: [:create]
end

Basic Styling

.chat-container {
  max-width: 800px;
  margin: 0 auto;
  height: 100vh;
  display: flex;
  flex-direction: column;
}

.messages {
  flex: 1;
  overflow-y: auto;
  padding: 1rem;
}

.message {
  margin-bottom: 1rem;
  padding: 1rem;
  border-radius: 8px;
}

.message-user {
  background: #e3f2fd;
  margin-left: 2rem;
}

.message-assistant {
  background: #f5f5f5;
  margin-right: 2rem;
}

.message-role {
  font-weight: bold;
  font-size: 0.875rem;
  margin-bottom: 0.5rem;
}

.message-content {
  white-space: pre-wrap;
}

.message-input {
  display: flex;
  gap: 0.5rem;
  padding: 1rem;
  border-top: 1px solid #ddd;
}

.message-input textarea {
  flex: 1;
  padding: 0.75rem;
  border: 1px solid #ddd;
  border-radius: 4px;
  resize: none;
}

.message-input button {
  padding: 0.75rem 1.5rem;
  background: #1976d2;
  color: white;
  border: none;
  border-radius: 4px;
  cursor: pointer;
}

Switching Models

One of RubyLLM's best features: same API for every provider.
# Use GPT-4
chat = Chat.create!(model: 'gpt-4o')

# Use Claude
chat = Chat.create!(model: 'claude-sonnet-4')

# Use Gemini
chat = Chat.create!(model: 'gemini-2.0-flash')

# Switch mid-conversation
chat.with_model('claude-sonnet-4')
chat.ask("Continue our discussion...")
The streaming, persistence, and broadcasting all work the same regardless of provider.

Adding System Prompts

class Chat < ApplicationRecord
  acts_as_chat
  
  after_create :set_system_prompt
  
  private
  
  def set_system_prompt
    with_instructions(
      "You are a helpful assistant. Be concise and friendly."
    )
  end
end
Or set it dynamically:
# In controller
@chat = current_user.chats.create!(model: 'gpt-4o-mini')
@chat.with_instructions("You are a Ruby expert. Help the user with Rails questions.")

Handling Errors

Wrap the streaming in error handling:
class ChatResponseJob < ApplicationJob
  queue_as :default
  
  def perform(chat_id)
    chat = Chat.find(chat_id)
    user_message = chat.messages.where(role: :user).last
    return unless user_message
    
    begin
      chat.ask(user_message.content) do |chunk|
        next unless chunk.content.present?
        assistant_message = chat.messages.where(role: :assistant).last
        assistant_message.broadcast_chunk(chunk.content)
      end
    rescue RubyLLM::RateLimitError
      broadcast_error(chat, "Rate limited. Please wait a moment and try again.")
    rescue RubyLLM::UnauthorizedError
      broadcast_error(chat, "API key issue. Please check configuration.")
    rescue RubyLLM::Error => e
      broadcast_error(chat, "Something went wrong: #{e.message}")
    end
  end
  
  private
  
  def broadcast_error(chat, message)
    Turbo::StreamsChannel.broadcast_append_to(
      [chat, "messages"],
      target: "messages",
      partial: "messages/error",
      locals: { message: message }
    )
  end
end

File Attachments

RubyLLM supports sending images, PDFs, and audio to models that support them:
# In your controller
def create
  @chat = current_user.chats.find(params[:chat_id])
  
  @message = @chat.messages.create!(
    role: :user,
    content: params[:content]
  )
  
  # Pass file path to the job
  ChatResponseJob.perform_later(
    @chat.id, 
    params[:file]&.path
  )
end

# In the job
def perform(chat_id, file_path = nil)
  chat = Chat.find(chat_id)
  user_message = chat.messages.where(role: :user).last
  
  if file_path
    chat.ask(user_message.content, with: file_path) do |chunk|
      # ... streaming logic
    end
  else
    chat.ask(user_message.content) do |chunk|
      # ... streaming logic
    end
  end
end

The Generator Shortcut

If you want a complete working UI out of the box:
rails generate ruby_llm:chat_ui
This generates controllers, views, jobs, and routes for a full chat interface. Visit /chats and start chatting.

Message Ordering

Action Cable processes messages concurrently, which can cause out-of-order delivery during fast streaming. Two solutions:
1. Client-side reordering (simpler):
// Stimulus controller to reorder messages by timestamp
import { Controller } from "@hotwired/stimulus"

export default class extends Controller {
  static targets = ["message"]
  
  connect() {
    this.reorder()
    new MutationObserver(() => this.reorder())
      .observe(this.element, { childList: true })
  }
  
  reorder() {
    const messages = Array.from(this.messageTargets)
    messages.sort((a, b) => 
      new Date(a.dataset.createdAt) - new Date(b.dataset.createdAt)
    )
    messages.forEach(m => this.element.appendChild(m))
  }
}
2. Use AnyCable: Provides server-side ordering guarantees through sticky concurrency.

Production Considerations

1. Use Sidekiq or another production queue:
# config/application.rb
config.active_job.queue_adapter = :sidekiq
2. Configure Redis for Action Cable:
# config/cable.yml
production:
  adapter: redis
  url: <%= ENV.fetch("REDIS_URL") %>
3. Set reasonable timeouts:
# config/initializers/ruby_llm.rb
RubyLLM.configure do |config|
  config.request_timeout = 120
end

The Result

You now have a ChatGPT-style interface where:
  • Messages appear instantly when sent
  • AI responses stream in real-time, word by word
  • Everything persists to the database automatically
  • Switch between GPT, Claude, Gemini with one line
  • No JavaScript API calls—just Ruby and Turbo
RubyLLM handles the provider complexity. Turbo handles the real-time UI. You handle the business logic. That's how building AI features in Rails should feel.

References