Excerpt

You might have heard about Prism, the new Ruby parser. Perhaps you've heard it's faster, more reliable, and more powerful than what we had before. Or maybe you never took a compilers class and aren't sure about what this actually means.
I'm here to tell you all about it, and how it's changing our lives as Ruby developers. Today, I want to take you from square one to writing your first transpiler.
## Interpreters 101
Before we begin our journey, let's start with the basics of how an interpreter works, so we're all on the same page. Interpreting a programming language usually involves three main steps:
1. Tokenizing input (a.k.a. lexing): Breaking the input text into a list of meaningful tokens. That's like converting your code into something like this:
```plain text
{ type: :integer, literal: "0", value: 0, line: 1 },
{

You might have heard about Prism, the new Ruby parser. Perhaps you've heard it's faster, more reliable, and more powerful than what we had before. Or maybe you never took a compilers class and aren't sure about what this actually means.
I'm here to tell you all about it, and how it's changing our lives as Ruby developers. Today, I want to take you from square one to writing your first transpiler.
## Interpreters 101
Before we begin our journey, let's start with the basics of how an interpreter works, so we're all on the same page. Interpreting a programming language usually involves three main steps:
1. Tokenizing input (a.k.a. lexing): Breaking the input text into a list of meaningful tokens. That's like converting your code into something like this:
```plain text
{ type: :integer, literal: "0", value: 0, line: 1 },
{ type: :operator, literal: "+", value: nil, line: 1 },
{ type: :integer, literal: "1", value: 1, line: 1 },
{ type: :keyword, literal: "if", value: 1, line: 1 },
{ type: :identifier, literal: "admin?", value: nil, line: 1 }
```
1. Parsing: Analyzing the tokens to understand the program structure (what to do and in which order) and building a data representation that holds that information (known as an Abstract Syntax Tree). For example:
```plain text
node_type: :binary,
operation: "+",
node_type: :number,
value: 0
right: {
node_type: :number,
value: 1
```
1. Evaluating: Executing the parsed input and producing an output. This is where your code actually runs.
For a deeper dive into this topic, I recommend the Crafting Interpreters book or my RailsConf talk on the subject.
Now let's dive into what Prism can do and how it helps with parsing.
## Why Is Prism Useful for Ruby Parsing?
Ruby historically used a parser called parse.y, built with Yacc. The catch? It was made specifically for CRuby, forcing other Ruby implementations (like JRuby and TruffleRuby) to create their own parsers from scratch.
That's why tools like RuboCop, code editors, and even other Ruby implementations often lagged behind or had incompatibilities with newer Ruby syntax. Developers building Ruby analysis tools had to write their own parsers too, spawning projects like whitequark/parser and ruby_parser.
Prism solves this by becoming the de facto parser for all Ruby tools and implementations. And it's working: it is now used in CRuby, JRuby, TruffleRuby, Rails, RuboCop, and more.
Okay, enough talk. Prism can lex and parse Ruby, which allows us to build fun things. How about we build a ✨ transpiler ✨ with it? Wait! Don't go away. This will be simple. I promise.
## Your First Transpiler
The full code for the examples in this post is available in this repository.
First, we'll build a tool that converts our Ruby code into Emoruby. If you have never seen Emoruby, this is what it looks like:
Equivalent to this in Ruby:
Ready? Ok, here we go. We'll need a Gemfile to install emoruby and prism:
After bundle installing, let's create the entry point for our transpiler — the Rubyemo.ruby_to_emoji method:
```plain text
require 'prism'
module Rubyemo
extend self
def ruby_to_emoji(src)
tokenize(src)
end
private
def tokenize(src)
result = Prism.lex(src)
raise "Invalid Ruby code" if result.errors.any?
result.value.map(&:first)
```
For now, it only tokenizes the input with Prism. The lex method returns a result object that contains either the tokens or errors. If the source code contains invalid Ruby, we'll just raise an exception.
Then we get the value attribute, which contains a list of tokens and some other stuff. We only care about the tokens, so we grab them with map(&:first).
### Emojify the Ruby Tokens
Now onto the fun part. How do we emojify our Ruby tokens? Emoruby has a very simple design, so we can basically replace tokens one by one with an emoji alternative:
```plain text
extend self
def ruby_to_emoji(src)
tokenize(src)
.then { emojify it }
end
private
def emojify(tokens)
tokens.filter_map do |token|
next if token.type == :EOF
token_to_emoji(token)
end
end
```
To do that mapping, we'll use Emoruby's translation file and EmojiData to translate token values to emojis by name.
```plain text
TRANSLATIONS = Emoruby::ConvertsRubyToEmoji::TRANSLATIONS
def token_to_emoji(token)
case token
in {type: :COMMENT, value:}
"💭#{value[1..]}"
in {type: :IGNORED_NEWLINE | :NEWLINE}
"\n"
else
token.value.to_s.split.map do |part|
TRANSLATIONS[part] || EmojiData.from_short_name(part)&.to_s || part
end.join(" ")
```
If a token has no mapping, we leave it as-is so the code still runs. Pattern matching is pretty handy here to match a particular type and grab its value at the same time.
Left squiggleRight squiggle


And... that's it! Let's see our little transpiler in action:
```plain text
emo = Rubyemo.ruby_to_emoji('puts "Hello, world!"')
puts emo # 👀💬Hello, world!💬
Emoruby.eval(emo) # Hello, world!
```
You might not have noticed, but our code doesn't handle indentation or spacing. See how the space between puts and the opening quote got lost? We can do better than this, so let's handle that. Luckily, the Prism::Token instances have information about their location, including the start/end line and column.
Let's change emojify to this:
```plain text
def emojify(tokens)
previous_line, previous_column = 1, 0
tokens.filter_map do |token|
next if token.type == :EOF
emoji = token_to_emoji(token)
indentation, previous_line, previous_column = indentation_for(
previous_line,
previous_column
indentation + emoji
end
```
And implement indentation_for:
```plain text
def indentation_for(token, previous_line, previous_column)
if token.location.start_line != previous_line
previous_line = token.location.start_line
previous_column = 0
indentation = " " * (token.location.start_column - previous_column)
previous_column = token.location.end_column
[indentation, previous_line, previous_column]
```
Try emojifying this now:
```plain text
class Heart
public def jeans
puts "purse"
end
protected def shirt
puts "yellow_heart"
end
private def wave
puts "smiley earth_asia"
end
Heart.new.wave
puts Rubyemo.ruby_to_emoji(ruby)
```
Done — for real.
## Onto Parsing with Prism for Ruby
Rubyemo helped us to learn Prism lexing, but we didn't need any parsing. Let's try another example. Let's say you learned about Ruby 3.2's Data class and you want to rewrite your old structs with it. Why spend 10 minutes doing it manually when you can write a script in one hour that does it?
Note: If you think this sounds like a RuboCop cop, you're 100% correct! You could turn this into a custom cop if you wanted.
While Prism has a method to parse code, for this, we'll use its Visitor class instead. The visitor design pattern allows us to add new operations to objects (in this case, the Prism AST nodes) without changing their classes. In other words, for each node type it finds, the Visitor class will call a method, and we decide what to do in that situation.
We need to act when we find Struct.new, so that's a method call, which in Prism is identified by the CallNode type. Let's filter those:
```plain text
class StructToData < Prism::Visitor
Fix = Data.define(:location, :replacement)
attr_reader :fixes
def initialize(src)
@src = src
@fixes = []
def visit_call_node(node)
if struct_new?(node)
# todo
end
end
private
def struct_new?(node)
node.name == :new &&
node.receiver.is_a?(Prism::ConstantReadNode) &&
node.receiver.name == :Struct
```
Note that we need to call super on the visit methods, which makes Prism keep walking inner nodes (like the body of a class definition).
Now we need to collect the struct arguments and build our fix object with the replacement code. We'll also skip named structs, as those don't map 1:1 to Data classes:
```plain text
def visit_call_node(node)
if struct_new?(node) && !named_struct?(node)
members = struct_members(node)
replacement = build_replacement(members, node.block)
@fixes << Fix.new(node.location, replacement)
end
# skips interpolated symbols for simplicity
def struct_members(node)
(node.arguments&.arguments || [])
.take_while { it.is_a?(Prism::SymbolNode) }
.map(&:slice)
def named_struct?(node)
(node.arguments&.arguments || [])
.first
.is_a?(Prism::StringNode)
def build_replacement(node_members, node_block)
call = "Data.define(#{node_members.join(", ")})"
if node_block
call += " #{node_block.slice}"
```
### Make Fixes with a Method
Now let's write a method to apply the fixes. We'll process them in ascending order of start position, so offsets remain correct as we build the new source.
```plain text
class StructToData < Prism::Visitor
def self.rewrite(source)
ast = Prism.parse(source)
return [source, []] unless ast.success?
v = new(source)
v.visit(ast.value)
[v.apply_fixes, v.fixes]
private
def apply_fixes
return @src if @fixes.empty?
pos = 0
out = +""
@fixes.sort_by { it.location.start_offset }.each do |fix|
out << @src.byteslice(pos...fix.location.start_offset)
out << fix.replacement
pos = fix.location.end_offset
out << @src.byteslice(pos..-1)
out
```
Note: We have to use byteslice because Prism offsets are in bytes, while replacing using something like String#[]= would fail on multibyte characters.
This is enough for us to test our code. Let's see it in action:
It works!
## Mutation: The Source of All Evil
There's one big problem with our current approach: Data objects are immutable, while structs aren't, so we can't always convert them. We have to also skip structs that mutate internal state.
We'll use an inner visitor to check if a struct body contains mutations:
```plain text
class MutationScanner < Prism::Visitor
@mutates = false
def mutates? = @mutates
def visit_call_node(n)
# self.x = ..., self[:k] = ...
if n.receiver.is_a?(Prism::SelfNode) && n.name.to_s.end_with?("=")
@mutates = true
# adding writers via macros
if n.name == :attr_writer || n.name == :attr_accessor
@mutates = true
# define_method(:x=) { ... }
if n.name == :define_method
arg = n.arguments&.arguments&.first
if (arg.is_a?(Prism::SymbolNode) || arg.is_a?(Prism::StringNode)) && arg.unescaped.end_with?("=")
@mutates = true
# def x=(...) / def []=(...)
def visit_def_node(n)
if n.name.to_s.end_with?("=")
@mutates = true
```
This catches many cases, but there are many more ways to mutate a value in Ruby (i.e., by writing to an instance variable): writing to ivars directly, using attribute writers, or memoization, to name a few. It would be a chore to define methods for each one manually.
Luckily, Prism consistently names the nodes for these mutation methods, so we'll just flag any nodes that perform a write:
```plain text
class MutationScanner < Prism::Visitor
# ... after def visit_def_node
Prism
.constants
.filter_map do |const_name|
next if const_name !~ /Write/ || const_name =~ /GlobalVariable|LocalVariable|Constant/
Prism.const_get(const_name)
.each do |node_class|
define_method("visit_#{node_class.type}") do |n|
@mutates = true
super(n)
end
```
So we get all "write" constants and define methods, but ignore writing to constants, local variables, and global variables (as those don't change a struct's internal state).
Let's wire up this scanner now:
```plain text
class StructToData < Prism::Visitor
def visit_call_node(node)
if struct_new?(node) && !named_struct?(node) && !mutates_instance_state?(node.block)
# build fix
def mutates_instance_state?(block_node)
return false if block_node.nil?
scanner = MutationScanner.new
scanner.visit(block_node)
scanner.mutates?
```
That's it! Now our rewriter is ready to dance. Try it out with some of your structs.
## Wrapping Up
Prism has already reshaped the Ruby landscape by making our tools faster, more portable, and more consistent. But its real impact will come from what you build with it.
Think bigger than just parsing: a Ruby-to-JS transpiler, a test runner that knows exactly which test to run from a file and line number, or even something that turns your code into pixel art. The parser is no longer the bottleneck — your imagination is.
Go make something amazing!

AppSignal monitors your apps