Crafting Interpreters, Ruby Style
Posted:
I finally have started working through Crafting Interpreters, a wonderful book about compilers by Robert Nystrom. The book steps through two interpreter implementations, one in Java and one in C, that ramp in complexity.
Now I don't know about you, but I hate Java. I can hardly stand to read it, let alone write it. That's why I decided to write my first Lox interpreter in Ruby, following along with the book as I can but converting bits and pieces into Rubyisms as I see fit.
In general, the Java code can be ported 1-1 to Ruby with no changes. Of course
there's some obvious stuff, like lack of types means I need fewer methods and no
coersions, or certain stdlib method namespaces that are updated to match Ruby
idioms (while
vs. until
, anyone?). However, lots of code I just accept as-is
and allow Nystrom to guide me through.
I've only worked through the first 7 chapters, but I did note down a few things in the Ruby conversion that I found interesting.
Avoiding switch statement fallthrough with regular expressions
Admittedly this difference is just a tiny syntactical detail, but one that plays
to Ruby's strengths. Take the book's implementation of scanToken
:
private void scanToken() {
char c = advance();
switch (c) {
case '(': addToken(LEFT_PAREN); break;
// ...
default:
if (isDigit(c)) {
number();
} else if (isAlpha(c)) {
identifier();
} else {
Lox.error(line, "Unexpected character.");
}
}
}
private boolean isDigit(char c) {
return c >= '0' && c <= '9';
}
// private boolean isAlpha...
Due to limitations in the Java switch statement, the author adds some
post-fallthrough checks to the default
case. This removes the need to check
every number and letter individually (0-9, a-z, A-Z as separate cases) because
the check is deferred into the default case, where an additional conditional
statement is applied. Aesthetically it's not an ideal solution since it breaks
up the otherwise regular pattern of case ... handler
that holds for the other
tokens. I don't know, it's just kinda ugly.
With Ruby, I can instead employ regular expressions directly in my switch statement:
def scan_token
case advance
when "("
add_token(:left_paren)
# ...
when /[[:digit:]]/
number
when /[[:alpha:]]/
identifier
else
Lox.error(@line, "unexpected character")
end
end
No default fallthrough needed! These tiny details are what keep me programming in Ruby.
Metaprogramming the easy way
The largest deviation between the Java and Ruby implementation is definitely the metaprogramming. In Implementing Syntax Trees the author employs metaprogramming through an independent build step.
First, a new package is created (com.craftinginterpreters.tool
) with a couple
of classes that themselves generate Java classes by writing strings to a file:
private static void defineType(
PrintWriter writer, String baseName,
String className, String fieldList) {
writer.println(" static class " + className + " extends " +
baseName + " {");
// Constructor.
writer.println(" " + className + "(" + fieldList + ") {");
// Store parameters in fields.
String[] fields = fieldList.split(", ");
for (String field : fields) {
String name = field.split(" ")[1];
writer.println(" this." + name + " = " + name + ";");
}
writer.println(" }");
// Fields.
writer.println();
for (String field : fields) {
writer.println(" final " + field + ";");
}
writer.println(" }");
}
These string builders are hooked up to a separate entrypoint (made for the
tool
Java package) and are compiled separately. The result spits out a bunch
of .java
files into the com.craftinginterpreters.lox
package, whereby the
programmer checks them into the project.
It's not a bad solution by any means, but requiring a separate build step and metaprogramming by concatenating strings is a little rough. The Ruby solution is totally different thanks to a bunch of built-in metaprogramming utilities (and the fact that Ruby is an interpreted language).
Here's how I wired up the expression generation:
module Rlox
module Expr
EXPRESSIONS = [
["Binary", [:left, :operator, :right]],
["Grouping", [:expression]],
["Literal", [:value]],
["Unary", [:operator, :right]]
]
EXPRESSIONS.each do |expression|
classname, names = expression
klass = Rlox::Expr.const_set(classname, Class.new)
klass.class_eval do
attr_accessor(*names)
define_method(:initialize) do |*values|
names.each_with_index do |name, i|
instance_variable_set(:"@#{name}", values[i])
end
end
define_method(:accept) do |visitor|
visitor.public_send(:"visit_#{classname.downcase}_expr", self)
end
end
end
end
end
When this file is included into rlox.rb
(the main entrypoint to the
interpreter), Ruby goes ahead and builds all of the expression classes
dynamically. No build step needed, just good ol' Ruby metaprogramming.
Rlox::Expr.const_set
adds the class to the scope of the Rlox::Expr
module,
re-opening it on the next line via class_eval
to add in the
automatically-generated methods.
To close the loop, here's what one of the generated classes looks like if it were to be written out by hand (while also avoiding the dynamic instance variable setter):
module Rlox
module Expr
class Binary
attr_accessor :left, :operator, :right
def initialize(left, operator, right)
@left = left
@operator = operator
@right = right
end
def accept(visitor)
visitor.visit_binary_expr(self)
end
end
end
end
Comparing the Ruby and Java implementation is interesting because it highlights
some higher-level advantages and disadvantages between the two languages. With
the Ruby version, adding new types is trivial and does not require an additional
compile + check-in step. Just add a name-argument pair to the EXPRESSIONS
constant and you're done!
The flip-side of this is the class is not easily inspectable. Although I wrote
Rlox::Expr::Binary
above this paragraph as regular Ruby code, that code
doesn't exist anywhere in the application where a programmer's eyes can read it.
Instead, developers have to read the metaprogramming code in expr.rb
to
understand how the classes work.
I think this implementation leans idiomatic Ruby: metaprogramming is part of the toolkit so it's expected for developers to learn how to deal with it. If you're interested in learning how the class works and can't understand the metaprogramming code, you can always boot up the console and poke around with an instance of the class. It kind of coincides with the Ruby ethos that a REPL should be close at hand so you can explore code concepts that you might otherwise misunderstand by reading the code.
That said, I still have respect for the Java implementation because Ruby metaprogramming can really end up biting you in the ass.
TDD (well, not really)
I'm sure Nystrom omitted tests from the book because it would add a ton of implementation noise to the project, and not in a way that benefited the explanation. For my purposes, I wanted to make sure I added tests with each chapter to make sure my implementation wasn't drifting from the expectation.
It's not perfect by any means, but it definitely gives me a ton of confidence that I'm following along with the material and exercising some of the trickier edge cases. I was also impressed that Nystrom's implementation is really easy to test. Here's an example from the parser:
class TestParser < Minitest::Test
def test_it_handles_comparison
got = parse("2 > 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :greater, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
got = parse("2 >= 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :greater_equal, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
got = parse("2 < 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :less, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
got = parse("2 <= 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :less_equal, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
end
def parse(str)
scanner = Rlox::Scanner.new(str)
tokens = scanner.scan_tokens
parser = Rlox::Parser.new(tokens)
# Call private method to bubble up exception that is caught by #parse
parser.send(:expression)
end
end
Astute readers might recognize that the parse
helper function defined within
the test is also calling into the Rlox::Scanner
class. That's one item that
I've taken the quick and easy approach towards: rather than ensure test
isolation by writing out the AST with the Rlox::Expr
/Rlox::Statement
classes
(which are incredibly verbose), I use Rlox::Scanner
so I can write my tests as
string expressions that read like the code I'm testing. Unfortunately, that does
mean that if I write a bug into the Rlox::Scanner
class, that bug is
propogated into the Rlox::Parser
tests, but in my head it's better than the
alternative of tripling the lines of code for my test files. What can you do?
Next steps
There might be a part two for this post as I work my way further through the first Lox interpreter. If you're interested in following along with the code, check it out on Github.