Comment your regular expressions

Ask questions Research chat →

https://thoughtbot.com/blog/comment-your-regular-expressions · scraped

programming

Attachments

Scraped Content

— 338 words · 2026-02-14 03:21:58 UTC ·

Excerpt

Comment your regular expressions Regular expressions have a reputation for being cryptic and arcane, and with good reason: their syntax is dense and non-obvious. Unfortunately that leads many people to not view them as real code, so they copy-and-paste them without analyzing them to verify their behavior, or they ignore them in code reviews. This isn’t ideal; is there a way to help ensure that regexes are treated with the significance that they warrant? Yes: we can comment them! For example, let’s take this regex for a USA postal code in Ruby: usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/ That’s pretty hard to read; no wonder we want to gloss over it. Using Ruby’s “extended mode” for regexes via the x flag (and a %r{⋯} symmetrical percent literal for better readability across multiple lines), we can split that into parts and add comments explaining them: usa_postal_code_pattern = %r{ \A # Beginning of string \d{5} # 5 digits ( # ZIP+4 - # Hyph
Comment your regular expressions Regular expressions have a reputation for being cryptic and arcane, and with good reason: their syntax is dense and non-obvious. Unfortunately that leads many people to not view them as real code, so they copy-and-paste them without analyzing them to verify their behavior, or they ignore them in code reviews. This isn’t ideal; is there a way to help ensure that regexes are treated with the significance that they warrant? Yes: we can comment them! For example, let’s take this regex for a USA postal code in Ruby: usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/ That’s pretty hard to read; no wonder we want to gloss over it. Using Ruby’s “extended mode” for regexes via the x flag (and a %r{⋯} symmetrical percent literal for better readability across multiple lines), we can split that into parts and add comments explaining them: usa_postal_code_pattern = %r{ \A # Beginning of string \d{5} # 5 digits ( # ZIP+4 - # Hyphen \d{4} # 4 digits )? # ZIP+4 is optional \z # End of string }x Beware that because whitespace is deliberately ignored in this mode, you must escape it when you want to represent literal whitespace characters. For example, here’s a pattern for UK postal codes: uk_postal_code_pattern = %r{ \A # Beginning of string [A-Z]{1,2} # 1–2 capital letters \d # Digit [A-Z\d]? # Optional capital letter or digit (\ ) # Single space \d # Digit [A-Z]{2} # 2 capital letters \z # End of string }x In the above examples, every line is commented in order to be illustrative. That’s probably not necessary for most regexes. This is possible in other languages too! Perl supports it; Python calls it the “verbose” flag; in JavaScript you can use string concatenation. If you enjoyed this post, you might also like: About thoughtbot We've been helping engineering teams deliver exceptional products for over 20 years. Our designers, developers, and product managers work closely with teams to solve your toughest software challenges through collaborative design and development. Learn more about us.

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note


Saved!

Annotations

Export as Markdown
+ Annotate selection

Add Annotation