mgmarlow.com

A Thorough Look into RBS for Rails

by Graham Marlow in ruby, rails

Back in 2020, Ruby 3.0 introduced a system for writing type definitions for Ruby programs called RBS. Five years later, I rarely hear about it being used in production Rails apps.

The situation is somewhat similar to Ractors, another feature released with Ruby 3.0. The idea of Ractors brought a ton of buzz to the Ruby community because it directly answered one of the major pain points of the language: sidestepping the GVL for real parallelism. However, the reality of Ractors and the semantics of shareable objects made them nearly impossible to adopt.

I don't think RBS is in the same boat. At first blush, the toolset feels usable and sophisticated, if lacking a bit in maturity. My experimentation with it has already yielded benefit in a real-world Rails application. Adoption in an existing codebase isn't trivial, but the semantics of RBS make it possible to do incrementally.

One of the main problems I see with RBS coverage on the web is that it's heavily catered towards RubyGems or vanilla Ruby. There are not many articles that explain RBS in Rails beyond a few cursory comparisons with Sorbet. This post aims to fill that gap.

Overall, I think RBS is in a fairly good spot. It still has rough edges, but many of those edges are intrinsic to the nature of Ruby metaprogramming. Just like with TypeScript and JavaScript, static types will change the way that you write Ruby code to guarantee states and branches that are analyzable by static tools. Whether that's good or bad depends on taste.

Let's dive in.

RBS tools

When working with RBS in a Rails codebase, you will make ample use of the following tools:

  • rbs, used for signature prototyping (AKA code generation) and managing signatures for RubyGems. I will refer to the gem from now on as rbs and the general signature format as RBS.
  • Steep, the gem that actually performs type-checking.
  • rbs_rails, a handy set of Rake tasks to help generate ActiveRecord signatures.

It's important to note that rbs does not actually perform type-checking. It only defines the syntax that you will use for writing signature files and provides an API for accessing those signatures programmatically. Steep is the program that actually checks if the types are valid.

rbs does, however, provide some handy CLI tools that you will use often. The first is signature generation via rbs prototype. The second is RBS Collection, effectively Bundler for signatures. The RBS Collection community repository holds signatures for many popular RubyGems, notably Rails and Sidekiq. Your application interacts with RBS Collection via the rbs collection command.

As mentioned earlier, type-checking is performed by Steep. You will make frequent use of the Steep CLI to check your app for warnings and errors, but you can also integrate Steep directly into your text editor thanks to LSP. Here's an example for Helix:

# ~/.config/helix/languages.toml
[language-server.steep]
command = "steep"
args = ["langserver"]

[[language]]
name = "ruby"
language-servers = ["steep"]

I'll walk through using each of these tools in a Rails app in the next section.

Adding RBS to a new Rails app

From a clean slate:

rails new myapp

Add Steep and run init to generate its configuration file (Steepfile):

bundle add steep
bundle exec steep init

Open up Steepfile and you'll see a bunch of configuration options commented out. Feel free to keep these around for future reference, but the configuration that we'll use for this post is the following:

# Steepfile
target :app do
  signature "sig"

  check "app"
end

In short,

  • A project may contain many targets, although we only make use of the :app target. You will likely want a separate target for tests.
  • Our signature files go in the sig/ directory.
  • When type-checking, we check the entirety of app/. You can use separate check lines to incrementally adopt RBS, globbing particular files or directories. More on this later.

Run the type-checker and you'll observe some 13 errors:

bundle exec steep check
# Output of 13-ish errors and warnings...

Most of these errors and warnings are from missing definitions, as we haven't yet added any signature files. However, quite a few of the errors point to missing definitions in baseline Rails classes (e.g. ActiveRecord). Steep doesn't yet know how to locate signatures for Rails.

Rails doesn't maintain its own signatures, but luckily there are community-maintained signatures in RBS Collection. Let's set them up:

rbs collection init
rbs collection install

During installation, RBS Collection walks your Gemfile.lock and pulls signatures for gems that (a) are registered in the repository or (b) ship with their own signatures via RubyGems. After running the initialization commands, you'll have three new files/folders on your machine:

  • rbs_collection.yaml (a configuration file that we won't mess with)
  • rbs_collection.lock.yaml (equivalent to a Gemfile.lock)
  • .gem_rbs_collection/ (the actual signatures pulled during install)

You will likely want to add .gem_rbs_collection/ to your .gitignore. You will also want to add rbs collection install to your bin/setup to ensure future users of your Rails app won't run into trouble with missing signatures.

# bin/setup
puts "== Installing dependencies =="
system("bundle check") || system!("bundle install")

puts "== Installing signatures =="
system!("rbs collection install")
# ...

Run check again and observe that we've fixed all of the errors, though we still have 5 warnings:

bundle exec steep check
# 0 errors, 5 warnings

These warnings are from missing signatures in our actual application code. Since the codebase is virtually empty, the prototype signatures generated by rbs are more than adequate:

rbs prototype rb --out-dir=sig/app/ app/

# Output:
Processing `app/`...
  Generating RBS for `app/controllers/application_controller.rb`...
    - Writing RBS to `sig/app/controllers/application_controller.rbs`...
  Generating RBS for `app/helpers/application_helper.rb`...
    - Writing RBS to `sig/app/helpers/application_helper.rbs`...
  Generating RBS for `app/jobs/application_job.rb`...
    - Writing RBS to `sig/app/jobs/application_job.rbs`...
  Generating RBS for `app/mailers/application_mailer.rb`...
    - Writing RBS to `sig/app/mailers/application_mailer.rbs`...
  Generating RBS for `app/models/application_record.rb`...
    - Writing RBS to `sig/app/models/application_record.rbs`...

Note: Steep doesn't actually care where your RBS files are located. Matching the directory structure of app/ is merely convention. We'll cover a more complicated directory structure that makes use of rbs subtract later on.

Run check again and observe no errors:

bundle exec steep check
# Type checking files:

......

No type error detected. 🫖

That's most of the setup done.

Things get more complicated with the addition of rbs_rails, which adds another layer of code generation to our signature folder. I will talk through that setup later on, after discussing incremental adoption.

Incremental adoption for existing Rails apps

The nature of RBS signatures living separately from the Ruby code they represent is convenient for incremental adoption. Same goes for the check attribute in our Steepfile, which enables us to slowly migrate directories over to type-checking.

Incremental adoption isn't perfect because you'll inevitably have warnings for untyped code, but that can be mitigated by only checking for errors in CI and keeping an ignore file for known violations.

The general strategy begins with a Steepfile that works against one directory at a time:

# Steepfile
target :app do
  signature "sig"

  check "app/models"
  check "app/services/**/some-dir/*.rb"
  # Add more as you develop more signatures...
end

The workflow looks something like this:

  1. Add a new directory to Steepfile
  2. Prototype the directory
  3. Run steep check and fix violations
  4. Ignore warnings or errors due to untyped references
  5. Check-in your work
  6. Repeat

I mention ignoring warnings or errors due to untyped references because you'll often come across untyped code when working incrementally. For example, adding types for a worker that references a model and a service, where the model or service hasn't itself been typed.

To help mitigate this issue, I recommend starting with app/models/, as it's arguably the directory with the widest surface area in your Rails app.

Another helpful technique is to configure Steep to (a) ignore warnings and (b) ignore certain known violations. You can automate this configuration with a Rake task:

# Rakefile
require 'steep/rake_task'

Steep::RakeTask.new do |t|
  t.check.severity_level = :error
  t.check.with_expectations = true
end

When we run Steep, we'll ask it to save known violations into a YAML file so they can be ignored by future runs if the with_expectations flag is true. Generate the file of known violations by invoking Steep with a couple of extra flags:

bundle exec steep --severity-level=error --save-expectations

After generating known violations, check in the file:

git add steep_expectations.yml

From here on out, the Rake task will ignore any violation present in that file. Use this sparingly!

Generating signatures for Rails models

As you'll quickly find out, the signatures generated via rbs prototype are very primitive. This is especially true for Rails models, which make heavy use of metaprogramming.

Instead of generating model signatures by hand, use rbs_rails. This gem provides a few Rake tasks that generate ActiveRecord models based on their runtime behavior, filling in a ton of detail that would otherwise be missing from rbs prototype.

bundle add rbs_rails --require=false
bin/rails g rbs_rails:install

Installation provides the following tasks in rbs.rake:

  • rbs_rails:generate_rbs_for_models
  • rbs_rails:generate_rbs_for_path_helpers
  • rbs_rails:all (models and path helpers)

When migrating an existing app, your app/models directory should probably be your first priority and rbs_rails:all is a huge help.

Keep generated signatures separate

Up to this point in the post we've been prototyping signatures directly into our app's sig/app/ folder. However, you do not want to do this with rbs_rails. In fact, rbs_rails will place all of its generated signatures in sig/rbs_rails/ by default, and for good reason.

You don't want to be in a situation where your hand-written edits are overwritten by generated code. This is especially relevant with rbs_rails, because you'll be re-running the rbs_rails rake tasks anytime a model changes, particularly if that change involves database migrations. Unlike the signatures created by rbs prototype, which serve merely as a starting point, the signatures generated by rbs_rails can and should be kept separate.

To enable hand-written edits to rbs_rails, we'll set up a wrapper task that merges hand-written signatures with those generated programmatically. This allows us to reliably re-run the rbs_rails rake tasks without needing to worry that the new signatures will overwrite custom modifications.

# Rakefile
namespace :rbs do
  task :generate do
    # Use rbs_rails to generate signatures
    Rake::Task['rbs_rails:all'].invoke

    # Remove hand-written touch-ups from rbs_rails generated code.
    # This assumes you have hand-written signatures in sig/app/models.
    `bin/rbs subtract --write sig/rbs_rails sig/app/models`
  end
end

The necessary folder structure looks like this:

sig/
  rbs_rails/
    app/
      models/
        user.rbs  # auto-generated
  app/
    models/
      user.rbs    # hand-written

After running rbs:generate, the auto-generated code will have definitions removed if those definitions conflict with hand-written code. In effect, hand-written code is always preferred to auto-generated. Both directories are checked in to source control.

The author of rbs subtract and rbs_rails wrote a design doc where you can learn more about this workflow: Design Doc of rbs subtract.

Signatures for third-party code

Using RBS Collection to manage third-party gems can be a little confusing, so here are some guidelines.

If a gem is part of RBS Collection (and present in your Gemfile.lock) you're good to go. Run rbs collection init/install and reap the benefits of community-maintained signatures.

If a gem isn't in the collection and does not publish its own signatures, you will need to provide them yourself.

For stdlib gems, things get a little confusing. If that gem is truly part of the Ruby stdlib, but not present in your Gemfile.lock, you need to tell Steep that it's active in your project. This is the case for gems like yaml or net/http that need to be required despite existing in Ruby Core:

target :app do
  # ... snip
  library "yaml"
  library "net-http"
end

It's easy to forget what should and should not be added as a "library" in your Steepfile, so here's my guidance. Never add third-party gems to your Steepfile. Only add stdlib gems, if that gem is not present in your Gemfile.lock. Everything else is either already installed via RBS Collection or the signatures do not exist and you will need to write them yourself.

Code comments

By default rbs prototype copies code comments from your Ruby files into your RBS files. This brings up an interesting question: where should your comments live, in the code or in the signatures?

The answer is both!

In RBS files, add comments that explain high-level details about the API. These comments should describe the intended purpose of the class and help developers understand how to use it.

In Ruby files, add comments that explain the intricacies of the code. These comments are made for those modifying the code to suit new purposes (or fix existing bugs).

For example,

# feed_parser.rbs
# Fetch and RSS/Atom feed and return a list of Articles.
class FeedParser
  # @param url [String] The URL of the RSS or Atom feed.
  def initialize: (String url) -> void

  # Fetches and parses the feed, returning a list of articles.
  # @return [Array[Article]]
  def articles: () -> Array[Article]
end

# feed_parser.rb
class FeedParser
  def articles
    # Note that RSS and Atom differ in the following respects: ...
  end
end

RBS will change the way you write Ruby code

I have found that, like TypeScript, RBS changes the way you write Ruby code. I generally think this is good news for the clarity of your code.

The most common case that I've run into is the issue of "flow specificity", or, the inability of Steep to narrow a type based on context. Do not take this as a bad mark against Steep, TypeScript suffers from the same issue. The problem is that Ruby code carries certain patterns that are more prone to flow specificity issues because of how Ruby developers like to write code.

Here's an example:

class FooService
  def execute
    return unless some_variable

    do_something_with(some_variable)
  end

  private

  def do_something_with(value)
    # ...
  end

  def some_variable
    # ...
  end
end

Say that some_variable is something that could be nil. Here's the signature:

class FooService
  @some_variable: Integer?

  def execute: () -> void

  private

  def do_something_with: (Integer x) -> void

  def some_variable: () -> Integer?
end

Run Steep and you'll see the following error:

app/services/foo_service.rb:5:22: [error] Cannot pass a value of type `(::Integer | nil)` as an argument of type `::Integer`
│   (::Integer | nil) <: ::Integer
│     nil <: ::Integer
│
│ Diagnostic ID: Ruby::ArgumentTypeMismatch
│
â””     do_something_with(some_variable)
                        ~~~~~~~~~~~~~

Detected 1 problem from 1 file

The problem is that although we check for the existence of some_variable at the beginning of execute, some_variable is a method. Steep cannot know that the second call of some_variable takes place after the first call was validated, nor that the second call is guaranteed to be non-nil due to flow control patterns.

Here's the workaround:

class FooService
  def execute
    value = some_variable
    return unless value

    do_something_with(value)
  end
  # ...
end

Assigning the method to a variable, checking that variable, and continuing to use that checked variable will lead Steep to the correct conclusion.

Now although I mentioned that this exact issue also exists in TypeScript, I've never seen it in practice. I don't think JS developers are incentivized to write flow control in the Ruby way because JS doesn't have the same optional parentheses that mask whether or not an identifier is a method or a variable (playground link):

const someVariable = (): number | null => 42
const doSomething = (n: number) => {}

// Looks weird and TypeScript complains!
if (someVariable()) {
  doSomething(someVariable()) // ArgumentError
}

This is a contrived example, but it's a problem that I've seen numerous times in a recently-converted Rails codebase. Especially when dealing with associations that allow null.

The solution, I think, is to avoid writing Ruby code that proliferates nil values. That means restructuring code with fewer methods and more local variables, that also means longer methods that pack more content, which finally means fewer one-line methods.

Clean Code aesthetics are on their way out, this is one more nail in the coffin.

Conclusion

Adding types to an existing Rails app is not trivial, but RBS allows for incremental adoption.

After working with RBS in a production application for two weeks, my team has already discovered a number of bugs that we've identified as real issues matching our Airbrake exception logs. RBS clearly revealed the bug, often a NoMethodError pointing to an object that might be nil, and identified other potential failure points that did not happen to align with our primary execution path.

Resolving the bug in these circumstances wasn't always trivial due to the tendency for Rubyists to wrap optional states behind methods that sidestep Steep's type-checking. Rewriting classes to suit the signatures and eliminate nil values, however, has been a net positive.

Overall, RBS has been well-worth the investment.