An Introduction to Ruby Parsing with Prism | AppSignal Blog

Ask questions Research chat →

https://blog.appsignal.com/2026/01/07/an-introduction-to-ruby-parsing-with-prism.html · scraped

ruby

Attachments

Scraped Content

— 1793 words · 2026-05-19 19:26:12 UTC ·

Excerpt

![](https://blog.appsignal.com/_next/image?url=%2Fimages%2Fblog%2F2026-01%2Fruby-parsing-prism.png&w=3840&q=90&dpl=dpl_HViZAWWhFQXikcXD9jMJRQoReXQB) You might have heard about Prism, the new Ruby parser. Perhaps you've heard it's faster, more reliable, and more powerful than what we had before. Or maybe you never took a compilers class and aren't sure about what this actually means. I'm here to tell you all about it, and how it's changing our lives as Ruby developers. Today, I want to take you from square one to writing your first transpiler. ## Interpreters 101 Before we begin our journey, let's start with the basics of how an interpreter works, so we're all on the same page. Interpreting a programming language usually involves three main steps: 1. Tokenizing input (a.k.a. lexing): Breaking the input text into a list of meaningful tokens. That's like converting your code into something like this: ```plain text { type: :integer, literal: "0", value: 0, line: 1 }, {
![](https://blog.appsignal.com/_next/image?url=%2Fimages%2Fblog%2F2026-01%2Fruby-parsing-prism.png&w=3840&q=90&dpl=dpl_HViZAWWhFQXikcXD9jMJRQoReXQB) You might have heard about Prism, the new Ruby parser. Perhaps you've heard it's faster, more reliable, and more powerful than what we had before. Or maybe you never took a compilers class and aren't sure about what this actually means. I'm here to tell you all about it, and how it's changing our lives as Ruby developers. Today, I want to take you from square one to writing your first transpiler. ## Interpreters 101 Before we begin our journey, let's start with the basics of how an interpreter works, so we're all on the same page. Interpreting a programming language usually involves three main steps: 1. Tokenizing input (a.k.a. lexing): Breaking the input text into a list of meaningful tokens. That's like converting your code into something like this: ```plain text { type: :integer, literal: "0", value: 0, line: 1 }, { type: :operator, literal: "+", value: nil, line: 1 }, { type: :integer, literal: "1", value: 1, line: 1 }, { type: :keyword, literal: "if", value: 1, line: 1 }, { type: :identifier, literal: "admin?", value: nil, line: 1 } ``` 1. Parsing: Analyzing the tokens to understand the program structure (what to do and in which order) and building a data representation that holds that information (known as an Abstract Syntax Tree). For example: ```plain text node_type: :binary, operation: "+", node_type: :number, value: 0 right: { node_type: :number, value: 1 ``` 1. Evaluating: Executing the parsed input and producing an output. This is where your code actually runs. For a deeper dive into this topic, I recommend the Crafting Interpreters book or my RailsConf talk on the subject. Now let's dive into what Prism can do and how it helps with parsing. ## Why Is Prism Useful for Ruby Parsing? Ruby historically used a parser called parse.y, built with Yacc. The catch? It was made specifically for CRuby, forcing other Ruby implementations (like JRuby and TruffleRuby) to create their own parsers from scratch. That's why tools like RuboCop, code editors, and even other Ruby implementations often lagged behind or had incompatibilities with newer Ruby syntax. Developers building Ruby analysis tools had to write their own parsers too, spawning projects like whitequark/parser and ruby_parser. Prism solves this by becoming the de facto parser for all Ruby tools and implementations. And it's working: it is now used in CRuby, JRuby, TruffleRuby, Rails, RuboCop, and more. Okay, enough talk. Prism can lex and parse Ruby, which allows us to build fun things. How about we build a ✨ transpiler ✨ with it? Wait! Don't go away. This will be simple. I promise. ## Your First Transpiler The full code for the examples in this post is available in this repository. First, we'll build a tool that converts our Ruby code into Emoruby. If you have never seen Emoruby, this is what it looks like: Equivalent to this in Ruby: Ready? Ok, here we go. We'll need a Gemfile to install emoruby and prism: After bundle installing, let's create the entry point for our transpiler — the Rubyemo.ruby_to_emoji method: ```plain text require 'prism' module Rubyemo extend self def ruby_to_emoji(src) tokenize(src) end private def tokenize(src) result = Prism.lex(src) raise "Invalid Ruby code" if result.errors.any? result.value.map(&:first) ``` For now, it only tokenizes the input with Prism. The lex method returns a result object that contains either the tokens or errors. If the source code contains invalid Ruby, we'll just raise an exception. Then we get the value attribute, which contains a list of tokens and some other stuff. We only care about the tokens, so we grab them with map(&:first). ### Emojify the Ruby Tokens Now onto the fun part. How do we emojify our Ruby tokens? Emoruby has a very simple design, so we can basically replace tokens one by one with an emoji alternative: ```plain text extend self def ruby_to_emoji(src) tokenize(src) .then { emojify it } end private def emojify(tokens) tokens.filter_map do |token| next if token.type == :EOF token_to_emoji(token) end end ``` To do that mapping, we'll use Emoruby's translation file and EmojiData to translate token values to emojis by name. ```plain text TRANSLATIONS = Emoruby::ConvertsRubyToEmoji::TRANSLATIONS def token_to_emoji(token) case token in {type: :COMMENT, value:} "💭#{value[1..]}" in {type: :IGNORED_NEWLINE | :NEWLINE} "\n" else token.value.to_s.split.map do |part| TRANSLATIONS[part] || EmojiData.from_short_name(part)&.to_s || part end.join(" ") ``` If a token has no mapping, we leave it as-is so the code still runs. Pattern matching is pretty handy here to match a particular type and grab its value at the same time. Left squiggleRight squiggle ![](https://blog.appsignal.com/_next/image?url=%2Fimages%2Fcomponents%2Fbanner%2Fimg-left%402x.png&w=3840&q=90&dpl=dpl_HViZAWWhFQXikcXD9jMJRQoReXQB) ![](https://blog.appsignal.com/_next/image?url=%2Fimages%2Fcomponents%2Fbanner%2Fimg-right%402x.png&w=3840&q=90&dpl=dpl_HViZAWWhFQXikcXD9jMJRQoReXQB) And... that's it! Let's see our little transpiler in action: ```plain text emo = Rubyemo.ruby_to_emoji('puts "Hello, world!"') puts emo # 👀💬Hello, world!💬 Emoruby.eval(emo) # Hello, world! ``` You might not have noticed, but our code doesn't handle indentation or spacing. See how the space between puts and the opening quote got lost? We can do better than this, so let's handle that. Luckily, the Prism::Token instances have information about their location, including the start/end line and column. Let's change emojify to this: ```plain text def emojify(tokens) previous_line, previous_column = 1, 0 tokens.filter_map do |token| next if token.type == :EOF emoji = token_to_emoji(token) indentation, previous_line, previous_column = indentation_for( previous_line, previous_column indentation + emoji end ``` And implement indentation_for: ```plain text def indentation_for(token, previous_line, previous_column) if token.location.start_line != previous_line previous_line = token.location.start_line previous_column = 0 indentation = " " * (token.location.start_column - previous_column) previous_column = token.location.end_column [indentation, previous_line, previous_column] ``` Try emojifying this now: ```plain text class Heart public def jeans puts "purse" end protected def shirt puts "yellow_heart" end private def wave puts "smiley earth_asia" end Heart.new.wave puts Rubyemo.ruby_to_emoji(ruby) ``` Done — for real. ## Onto Parsing with Prism for Ruby Rubyemo helped us to learn Prism lexing, but we didn't need any parsing. Let's try another example. Let's say you learned about Ruby 3.2's Data class and you want to rewrite your old structs with it. Why spend 10 minutes doing it manually when you can write a script in one hour that does it? Note: If you think this sounds like a RuboCop cop, you're 100% correct! You could turn this into a custom cop if you wanted. While Prism has a method to parse code, for this, we'll use its Visitor class instead. The visitor design pattern allows us to add new operations to objects (in this case, the Prism AST nodes) without changing their classes. In other words, for each node type it finds, the Visitor class will call a method, and we decide what to do in that situation. We need to act when we find Struct.new, so that's a method call, which in Prism is identified by the CallNode type. Let's filter those: ```plain text class StructToData < Prism::Visitor Fix = Data.define(:location, :replacement) attr_reader :fixes def initialize(src) @src = src @fixes = [] def visit_call_node(node) if struct_new?(node) # todo end end private def struct_new?(node) node.name == :new && node.receiver.is_a?(Prism::ConstantReadNode) && node.receiver.name == :Struct ``` Note that we need to call super on the visit methods, which makes Prism keep walking inner nodes (like the body of a class definition). Now we need to collect the struct arguments and build our fix object with the replacement code. We'll also skip named structs, as those don't map 1:1 to Data classes: ```plain text def visit_call_node(node) if struct_new?(node) && !named_struct?(node) members = struct_members(node) replacement = build_replacement(members, node.block) @fixes << Fix.new(node.location, replacement) end # skips interpolated symbols for simplicity def struct_members(node) (node.arguments&.arguments || []) .take_while { it.is_a?(Prism::SymbolNode) } .map(&:slice) def named_struct?(node) (node.arguments&.arguments || []) .first .is_a?(Prism::StringNode) def build_replacement(node_members, node_block) call = "Data.define(#{node_members.join(", ")})" if node_block call += " #{node_block.slice}" ``` ### Make Fixes with a Method Now let's write a method to apply the fixes. We'll process them in ascending order of start position, so offsets remain correct as we build the new source. ```plain text class StructToData < Prism::Visitor def self.rewrite(source) ast = Prism.parse(source) return [source, []] unless ast.success? v = new(source) v.visit(ast.value) [v.apply_fixes, v.fixes] private def apply_fixes return @src if @fixes.empty? pos = 0 out = +"" @fixes.sort_by { it.location.start_offset }.each do |fix| out << @src.byteslice(pos...fix.location.start_offset) out << fix.replacement pos = fix.location.end_offset out << @src.byteslice(pos..-1) out ``` Note: We have to use byteslice because Prism offsets are in bytes, while replacing using something like String#[]= would fail on multibyte characters. This is enough for us to test our code. Let's see it in action: It works! ## Mutation: The Source of All Evil There's one big problem with our current approach: Data objects are immutable, while structs aren't, so we can't always convert them. We have to also skip structs that mutate internal state. We'll use an inner visitor to check if a struct body contains mutations: ```plain text class MutationScanner < Prism::Visitor @mutates = false def mutates? = @mutates def visit_call_node(n) # self.x = ..., self[:k] = ... if n.receiver.is_a?(Prism::SelfNode) && n.name.to_s.end_with?("=") @mutates = true # adding writers via macros if n.name == :attr_writer || n.name == :attr_accessor @mutates = true # define_method(:x=) { ... } if n.name == :define_method arg = n.arguments&.arguments&.first if (arg.is_a?(Prism::SymbolNode) || arg.is_a?(Prism::StringNode)) && arg.unescaped.end_with?("=") @mutates = true # def x=(...) / def []=(...) def visit_def_node(n) if n.name.to_s.end_with?("=") @mutates = true ``` This catches many cases, but there are many more ways to mutate a value in Ruby (i.e., by writing to an instance variable): writing to ivars directly, using attribute writers, or memoization, to name a few. It would be a chore to define methods for each one manually. Luckily, Prism consistently names the nodes for these mutation methods, so we'll just flag any nodes that perform a write: ```plain text class MutationScanner < Prism::Visitor # ... after def visit_def_node Prism .constants .filter_map do |const_name| next if const_name !~ /Write/ || const_name =~ /GlobalVariable|LocalVariable|Constant/ Prism.const_get(const_name) .each do |node_class| define_method("visit_#{node_class.type}") do |n| @mutates = true super(n) end ``` So we get all "write" constants and define methods, but ignore writing to constants, local variables, and global variables (as those don't change a struct's internal state). Let's wire up this scanner now: ```plain text class StructToData < Prism::Visitor def visit_call_node(node) if struct_new?(node) && !named_struct?(node) && !mutates_instance_state?(node.block) # build fix def mutates_instance_state?(block_node) return false if block_node.nil? scanner = MutationScanner.new scanner.visit(block_node) scanner.mutates? ``` That's it! Now our rewriter is ready to dance. Try it out with some of your structs. ## Wrapping Up Prism has already reshaped the Ruby landscape by making our tools faster, more portable, and more consistent. But its real impact will come from what you build with it. Think bigger than just parsing: a Ruby-to-JS transpiler, a test runner that knows exactly which test to run from a file and line number, or even something that turns your code into pixel art. The parser is no longer the bottleneck — your imagination is. Go make something amazing! ![](https://blog.appsignal.com/_next/image?url=%2Fimages%2Fgeneral%2Fcall-to-action.png&w=3840&q=90&dpl=dpl_HViZAWWhFQXikcXD9jMJRQoReXQB) AppSignal monitors your apps

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note


Saved!

Annotations

Export as Markdown
+ Annotate selection

Add Annotation