Crafting Interpreters, Ruby Style
Posted:
I finally have started working through Crafting Interpreters, a wonderful book about compilers by Robert Nystrom. The book steps through two interpreter implementations, one in Java and one in C, that ramp in complexity.
Now I don't know about you, but I hate Java. I can hardly stand to read it, let alone write it. That's why I decided to write my first Lox interpreter in Ruby, following along with the book as I can but converting bits and pieces into Rubyisms as I see fit.
In general, the Java code can be ported 1-1 to Ruby with no
changes. Of course there's some obvious stuff, like lack of types
means I need fewer methods and no coersions, or certain stdlib method
namespaces that are updated to match Ruby idioms (while
vs. until
,
anyone?). However, lots of code I just accept as-is and allow Nystrom
to guide me through.
I've only worked through the first 7 chapters, but I did note down a few things in the Ruby conversion that I found interesting.
Avoiding switch statement fallthrough with regular expressions
Admittedly this difference is just a tiny syntactical detail, but one
that plays to Ruby's strengths. Take the book's implementation of
scanToken
:
private void scanToken() {
char c = advance();
switch (c) {
case '(': addToken(LEFT_PAREN); break;
// ...
default:
if (isDigit(c)) {
number();
} else if (isAlpha(c)) {
identifier();
} else {
Lox.error(line, "Unexpected character.");
}
}
}
private boolean isDigit(char c) {
return c >= '0' && c <= '9';
}
// private boolean isAlpha...
Due to limitations in the Java switch statement, the author adds some
post-fallthrough checks to the default
case. This removes the need
to check every number and letter individually (0-9, a-z, A-Z as
separate cases) because the check is deferred into the default case,
where an additional conditional statement is applied. Aesthetically
it's not an ideal solution since it breaks up the otherwise regular
pattern of case ... handler
that holds for the other tokens. I don't
know, it's just kinda ugly.
With Ruby, I can instead employ regular expressions directly in my switch statement:
def scan_token
case advance
when "("
add_token(:left_paren)
# ...
when /[[:digit:]]/
number
when /[[:alpha:]]/
identifier
else
Lox.error(@line, "unexpected character")
end
end
No default fallthrough needed! These tiny details are what keep me programming in Ruby.
Metaprogramming the easy way
The largest deviation between the Java and Ruby implementation is definitely the metaprogramming. In Implementing Syntax Trees the author employs metaprogramming through an independent build step.
First, a new package is created (com.craftinginterpreters.tool
) with
a couple of classes that themselves generate Java classes by writing
strings to a file:
private static void defineType(
PrintWriter writer, String baseName,
String className, String fieldList) {
writer.println(" static class " + className + " extends " +
baseName + " {");
// Constructor.
writer.println(" " + className + "(" + fieldList + ") {");
// Store parameters in fields.
String[] fields = fieldList.split(", ");
for (String field : fields) {
String name = field.split(" ")[1];
writer.println(" this." + name + " = " + name + ";");
}
writer.println(" }");
// Fields.
writer.println();
for (String field : fields) {
writer.println(" final " + field + ";");
}
writer.println(" }");
}
These string builders are hooked up to a separate entrypoint (made for
the tool
Java package) and are compiled separately. The result spits
out a bunch of .java
files into the com.craftinginterpreters.lox
package, whereby the programmer checks them into the project.
It's not a bad solution by any means, but requiring a separate build step and metaprogramming by concatenating strings is a little rough. The Ruby solution is totally different thanks to a bunch of built-in metaprogramming utilities (and the fact that Ruby is an interpreted language).
Here's how I wired up the expression generation:
module Rlox
module Expr
EXPRESSIONS = [
["Binary", [:left, :operator, :right]],
["Grouping", [:expression]],
["Literal", [:value]],
["Unary", [:operator, :right]]
]
EXPRESSIONS.each do |expression|
classname, names = expression
klass = Rlox::Expr.const_set(classname, Class.new)
klass.class_eval do
attr_accessor(*names)
define_method(:initialize) do |*values|
names.each_with_index do |name, i|
instance_variable_set(:"@#{name}", values[i])
end
end
define_method(:accept) do |visitor|
visitor.public_send(:"visit_#{classname.downcase}_expr", self)
end
end
end
end
end
When this file is included into rlox.rb
(the main entrypoint to the
interpreter), Ruby goes ahead and builds all of the expression classes
dynamically. No build step needed, just good ol' Ruby metaprogramming.
Rlox::Expr.const_set
adds the class to the scope of the Rlox::Expr
module, re-opening it on the next line via class_eval
to add in the
automatically-generated methods.
To close the loop, here's what one of the generated classes looks like if it were to be written out by hand (while also avoiding the dynamic instance variable setter):
module Rlox
module Expr
class Binary
attr_accessor :left, :operator, :right
def initialize(left, operator, right)
@left = left
@operator = operator
@right = right
end
def accept(visitor)
visitor.visit_binary_expr(self)
end
end
end
end
Comparing the Ruby and Java implementation is interesting because it
highlights some higher-level advantages and disadvantages between the
two languages. With the Ruby version, adding new types is trivial and
does not require an additional compile + check-in step. Just add a
name-argument pair to the EXPRESSIONS
constant and you're done!
The flip-side of this is the class is not easily inspectable. Although
I wrote Rlox::Expr::Binary
above this paragraph as regular Ruby
code, that code doesn't exist anywhere in the application where a
programmer's eyes can read it. Instead, developers have to read the
metaprogramming code in expr.rb
to understand how the classes work.
I think this implementation leans idiomatic Ruby: metaprogramming is part of the toolkit so it's expected for developers to learn how to deal with it. If you're interested in learning how the class works and can't understand the metaprogramming code, you can always boot up the console and poke around with an instance of the class. It kind of coincides with the Ruby ethos that a REPL should be close at hand so you can explore code concepts that you might otherwise misunderstand by reading the code.
That said, I still have respect for the Java implementation because Ruby metaprogramming can really end up biting you in the ass.
TDD (well, not really)
I'm sure Nystrom omitted tests from the book because it would add a ton of implementation noise to the project, and not in a way that benefited the explanation. For my purposes, I wanted to make sure I added tests with each chapter to make sure my implementation wasn't drifting from the expectation.
It's not perfect by any means, but it definitely gives me a ton of confidence that I'm following along with the material and exercising some of the trickier edge cases. I was also impressed that Nystrom's implementation is really easy to test. Here's an example from the parser:
class TestParser < Minitest::Test
def test_it_handles_comparison
got = parse("2 > 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :greater, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
got = parse("2 >= 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :greater_equal, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
got = parse("2 < 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :less, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
got = parse("2 <= 3")
assert_instance_of Rlox::Expr::Binary, got
assert_equal :less_equal, got.operator.type
assert_equal 2.0, got.left.value
assert_equal 3.0, got.right.value
end
def parse(str)
scanner = Rlox::Scanner.new(str)
tokens = scanner.scan_tokens
parser = Rlox::Parser.new(tokens)
# Call private method to bubble up exception that is caught by #parse
parser.send(:expression)
end
end
Astute readers might recognize that the parse
helper function
defined within the test is also calling into the Rlox::Scanner
class. That's one item that I've taken the quick and easy approach
towards: rather than ensure test isolation by writing out the AST with
the Rlox::Expr
/Rlox::Statement
classes (which are incredibly
verbose), I use Rlox::Scanner
so I can write my tests as string
expressions that read like the code I'm testing. Unfortunately, that
does mean that if I write a bug into the Rlox::Scanner
class, that
bug is propogated into the Rlox::Parser
tests, but in my head it's
better than the alternative of tripling the lines of code for my test
files. What can you do?
Next steps
There might be a part two for this post as I work my way further through the first Lox interpreter. If you're interested in following along with the code, check it out on Github.