Excerpt
Comment your regular expressions
Regular expressions have a reputation for being cryptic and arcane, and with
good reason: their syntax is dense and non-obvious. Unfortunately that leads
many people to not view them as real code, so they copy-and-paste them without
analyzing them to verify their behavior, or they ignore them in code reviews.
This isn’t ideal; is there a way to help ensure that regexes are treated with
the significance that they warrant?
Yes: we can comment them! For example, let’s take this regex for a
USA postal code in Ruby:
usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/
That’s pretty hard to read; no wonder we want to gloss over it. Using
Ruby’s “extended mode” for regexes via the x flag
(and a
%r{⋯} symmetrical percent literal
for better readability across multiple lines), we can split that into parts and
add comments explaining them:
usa_postal_code_pattern = %r{
\A # Beginning of string
\d{5} # 5 digits
( # ZIP+4
- # Hyph
Comment your regular expressions
Regular expressions have a reputation for being cryptic and arcane, and with
good reason: their syntax is dense and non-obvious. Unfortunately that leads
many people to not view them as real code, so they copy-and-paste them without
analyzing them to verify their behavior, or they ignore them in code reviews.
This isn’t ideal; is there a way to help ensure that regexes are treated with
the significance that they warrant?
Yes: we can comment them! For example, let’s take this regex for a
USA postal code in Ruby:
usa_postal_code_pattern = /\A\d{5}(-\d{4})?\z/
That’s pretty hard to read; no wonder we want to gloss over it. Using
Ruby’s “extended mode” for regexes via the x flag
(and a
%r{⋯} symmetrical percent literal
for better readability across multiple lines), we can split that into parts and
add comments explaining them:
usa_postal_code_pattern = %r{
\A # Beginning of string
\d{5} # 5 digits
( # ZIP+4
- # Hyphen
\d{4} # 4 digits
)? # ZIP+4 is optional
\z # End of string
}x
Beware that because whitespace is deliberately ignored in this mode, you must
escape it when you want to represent literal whitespace characters. For example,
here’s a pattern for
UK postal codes:
uk_postal_code_pattern = %r{
\A # Beginning of string
[A-Z]{1,2} # 1–2 capital letters
\d # Digit
[A-Z\d]? # Optional capital letter or digit
(\ ) # Single space
\d # Digit
[A-Z]{2} # 2 capital letters
\z # End of string
}x
In the above examples, every line is commented in order to be illustrative.
That’s probably not necessary for most regexes.
This is possible in other languages too!
Perl supports it;
Python calls it the “verbose” flag;
in JavaScript you can use string concatenation.
If you enjoyed this post, you might also like:
About thoughtbot
We've been helping engineering teams deliver exceptional products for over 20 years. Our designers, developers, and product managers work closely with teams to solve your toughest software challenges through collaborative design and development. Learn more about us.