DHH Killed 359 System Tests — and Rails Followed
37signals deleted every system test from HEY in late 2024 and not a single bug slipped through. This decision, driven by DHH's conclusion that browser-driven tests are "slow, brittle, and full of false negatives," has reshaped Rails testing orthodoxy. Rails 8.1 no longer generates system tests by default. The recommended stack is now lean and opinionated: Minitest, fixtures, integration tests for the heavy lifting, model tests for business logic, and a handful of smoke tests for end-to-end confidence. For anyone coming from RSpec, the transition is less about learning new syntax and more about unlearning habits the Rails framework no longer endorses.
"System tests have failed" — a decade of hope, reversed
On May 17, 2024, DHH published a landmark blog post on HEY World titled "System tests have failed." He argued that despite high hopes when Rails 5.1 introduced built-in system test support in 2016, the reality never matched the promise. System tests remained slow, brittle, and full of false negatives in practice. He noted he'd wasted far more time getting system tests to work reliably than he'd seen dividends from bugs caught.
This was a notable reversal. At RailsConf 2014, DHH had championed moving toward system tests, saying he wanted to replace controller tests with higher-level system tests through Capybara. A decade of production experience at 37signals changed his mind.
The reasoning centers on three compounding problems. Browser-driven tests are inherently slow because they launch a real browser and render full pages. They are brittle because JavaScript-driven UIs create timing issues that produce false negatives — tests that fail not because of bugs but because of race conditions in the test infrastructure. And debugging failures in a black-box browser test is disproportionately hard compared to a focused integration test.
Six months later, on November 8, 2024, DHH posted on X that they were killing ALL the system tests (359 cases) in HEY, replacing them with a minimal set of smoke tests and leaning on controller integration tests instead. The post drew 140,000 views. At Rails World 2025, he confirmed the results: 37signals kept only 10 system tests as smoke tests out of the original 359, and not a single bug slipped through that the deleted tests would have caught. Rails 8.1 formalized this shift — the scaffold generator no longer creates system test files by default.
How 37signals structures their test suite today
37signals uses vanilla Minitest with YAML fixtures across all their applications — Basecamp, HEY, and their other products. No RSpec. No factory_bot. No database_cleaner. Their test philosophy is deliberately minimal in tooling but thorough in coverage where it counts.
The current test hierarchy follows a clear pyramid. Model tests form the base and are the most numerous. They test business logic and domain rules by hitting the database directly through Active Record — DHH has long argued that mocking the database is "test-induced design damage." Controller integration tests form the middle layer and are the primary replacement for both old-style controller tests and most system tests. These exercise the full HTTP request/response cycle through the Rack stack without launching a browser. A small handful of smoke tests (roughly 10 for HEY) use Capybara to verify that top-level flows load and function. Finally, dedicated human QA testers perform exploratory testing for UI feel and edge cases.
The HEY test suite contains roughly 30,000 assertions and runs in under 4 minutes on an M4 Mac — dramatically faster than cloud CI's 15+ minutes. Rails 8.1 introduced bin/ci with a config/ci.rb DSL to encourage running the full suite locally on developer machines.
Integration tests versus system tests — a critical distinction
ActionDispatch::IntegrationTest uses Rack::Test internally to simulate HTTP requests directly against the Rails application in-process. No server starts. No browser launches. No network calls occur. The test and application share a single thread and database connection, so transactional fixtures work natively. You write tests using HTTP verbs — get, post, put, delete — and inspect the response object for status codes, headers, and rendered HTML. These tests execute in milliseconds each and go through the full Rack middleware stack including routing, but cannot evaluate JavaScript.
ActionDispatch::SystemTestCase wraps Capybara and drives a real browser (headless Chrome by default via Selenium). The application runs in a separate thread with a real Puma server. Tests use Capybara's DSL — visit, click_on, fill_in, assert_selector — and interact with the page as a user would. System tests can evaluate JavaScript, take screenshots on failure, and verify the complete rendering pipeline. But they pay for this with significant overhead: browser startup time, multi-threaded coordination, and full page rendering push execution into seconds per test.
get, post, assert_responsevisit, click_on, assert_text
ThreadingSingle-threadedMulti-threaded
ScreenshotsNoAutomatic on failure
The current guidance is clear: use integration tests for the vast majority of your test suite — testing HTTP responses, redirects, authentication flows, authorization, form submissions, and API endpoints. Reserve system tests for only the most critical user paths where JavaScript interaction must be verified, and keep that number small.
From controller tests to integration tests — and how RSpec maps onto Minitest
Rails deprecated ActionController::TestCase in Rails 5.0 (2016). DHH filed the pivotal GitHub issue (#18950) arguing that testing controller internals — instance variables via assigns() and template rendering via assert_template — was grossly overstepping the boundaries of what the test should know about. These methods were extracted to the rails-controller-testing gem, and scaffold generators switched to producing integration tests.
The replacement was straightforward. Instead of get :index (passing an action name), you write get products_url (passing a real URL). Instead of inspecting assigns(:products), you assert against the response body or database state. Integration tests go through the full Rack stack including routing and middleware, making them more realistic than the old controller tests that bypassed middleware entirely.
For developers coming from RSpec, the terminology mapping is direct:
MinitestRSpecEquivalence Integration tests (test/integration/)Request specs (spec/requests/)Both wrap ActionDispatch::IntegrationTest
System tests (test/system/)System specs (spec/system/)Both wrap ActionDispatch::SystemTestCase
Model tests (test/models/)Model specs (spec/models/)Both wrap ActiveSupport::TestCase
Controller tests (deprecated)Controller specs (deprecated)Both wrapped ActionController::TestCase
Minitest integration tests and RSpec request specs are functionally identical. Both use the same HTTP verb methods, both go through the full Rack stack, and both inspect the same response object. The only difference is syntax: assert_response :success versus expect(response).to have_http_status(:ok).
Practical guide for moving from RSpec to Minitest
The biggest mental shift isn't syntax — it's philosophy. RSpec encourages nested describe/context/it blocks that read like specifications. Minitest encourages flat, self-contained test methods that are just Ruby. There's no let for lazy evaluation, no subject, no shared examples as a first-class concept. Instead, you use setup blocks with instance variables, the test macro for readable names, and plain Ruby modules for shared behavior.
The assertion style inverts the argument order: RSpec's expect(actual).to eq(expected) becomes Minitest's assert_equal expected, actual — expected comes first. Use specific assertions (assert_nil, assert_includes, assert_instance_of) rather than generic ones. Use refute_* counterparts instead of negating assertions. For mocking, add the Mocha gem — Minitest's built-in Minitest::Mock is too basic for real-world use.
For fixtures, start with one or two default fixtures per model with sensible defaults, then customize within individual tests by updating attributes. This replaces FactoryBot's create(:user, admin: true) pattern with users(:one).update!(admin: true) or simply referencing pre-defined fixtures like users(:admin). Fixtures are loaded once per test run into the database and wrapped in transactions, making them significantly faster than factories that insert fresh rows for every test.
The practical Gemfile shrinks dramatically. A complete Minitest stack needs only Capybara and Selenium (for the few smoke tests), Mocha (for mocking), and optionally webmock (for HTTP stubbing) and minitest-reporters (for colored output).
Conclusion
DHH's testing evolution represents a decade-long experiment that produced a clear verdict: the testing pyramid should be bottom-heavy with model and integration tests, not top-heavy with browser-driven system tests. The practical results at 37signals — 359 system tests deleted with zero regressions — provide the strongest possible evidence for this approach. Rails 8.1 encoding this into the framework's defaults makes it the official recommendation, not just an opinion.
The key insight isn't that system tests are useless — it's that their cost-benefit ratio collapses at scale. Ten smoke tests covering critical paths plus thorough integration tests provide equivalent confidence at a fraction of the maintenance burden. For developers transitioning from RSpec, the path is clear: replace request specs with integration tests (same thing, different syntax), replace feature specs with a handful of smoke tests, embrace fixtures over factories, and lean into the simplicity that Minitest provides by design.