Pipelining without pipes

Ask questions Research chat →

https://thoughtbot.com/blog/pipelining-without-pipes-in-ruby · scraped

ruby

Attachments

Scraped Content

— 1433 words · 2026-02-14 03:14:29 UTC ·

Excerpt

GitHub has a library to syntax highlight code snippets called Linguist. It has an extensive list of languages and their characteristics (e.g., name, file extensions, color, etc.). I had some free time, so I came up with this silly question: what would be the average programming language color? I’m talking about the color used in the “Languages” section of a repository: We can get this done with a few lines of Ruby: require "net/http" require "yaml" # Some helper functions def fetch_url(url) Net::HTTP.get(URI(url)) end def parse_yaml(yaml_string) YAML.safe_load(yaml_string) end def hex_color_to_rgb(color) color.delete("#").scan(/../).map(&:hex) end def rgb_color_to_hex(color) color.map { |channel| channel.to_s(16).rjust(2, "0") }.join end GITHUB_LANGS = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml" # Fetch and parse the language list YAML langs_yaml = fetch_url(GITHUB_LANGS) langs = parse_yaml(langs_yam
GitHub has a library to syntax highlight code snippets called Linguist. It has an extensive list of languages and their characteristics (e.g., name, file extensions, color, etc.). I had some free time, so I came up with this silly question: what would be the average programming language color? I’m talking about the color used in the “Languages” section of a repository: We can get this done with a few lines of Ruby: require "net/http" require "yaml" # Some helper functions def fetch_url(url) Net::HTTP.get(URI(url)) end def parse_yaml(yaml_string) YAML.safe_load(yaml_string) end def hex_color_to_rgb(color) color.delete("#").scan(/../).map(&:hex) end def rgb_color_to_hex(color) color.map { |channel| channel.to_s(16).rjust(2, "0") }.join end GITHUB_LANGS = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml" # Fetch and parse the language list YAML langs_yaml = fetch_url(GITHUB_LANGS) langs = parse_yaml(langs_yaml) # Calculate the average color of the programming languages red_sum = 0 green_sum = 0 blue_sum = 0 color_count = 0 langs.each do |_name, details| next if details["type"] != "programming" # Skip "non-programming" languages next if details["color"].nil? # Skip languages without a color rgb = hex_color_to_rgb(details["color"]) red_sum += rgb[0] ** 2 green_sum += rgb[1] ** 2 blue_sum += rgb[2] ** 2 color_count += 1 end average_red = Math.sqrt(red_sum / color_count).to_i average_green = Math.sqrt(green_sum / color_count).to_i average_blue = Math.sqrt(blue_sum / color_count).to_i average_color = "##{rgb_color_to_hex([average_red, average_green, average_blue])}" puts average_color While writing this article, I discovered at least two different approaches to averaging colors: Sum individual color channels together and then divide it by the number of colors. Sum the square of each color channel, divide by the number of colors and take the square root of it. I used the second method here because the result color looks nicer. For simplicity, I also I left out details, like error handling. There you go! It prints out the average color. We can close our laptops and call it a day. There are few reasons to improve code that we won’t touch again. Well, for a throwaway project, that solution is indeed enough. Sometimes we have fun coding, and that’s it, no worries. In this particular case, though, I thought it would be an excellent exercise to practice functional programming. So, I decided to refactor it. Some things bothered me in the original code. That each block does a lot of stuff, and, as generally happens with things with lots of responsibilities, it doesn’t do them well: The logic to calculate the average color is split into different parts of the code. Some of it is inside the each block, and some of it is outside; The code is fragile: Updating color_count has to happen after the next if ... calls; It’s easy to miss why color_count is necessary at all and, instead, use langs.size to calculate the average color, which would give us the wrong result. The code is very procedural, and it feels weird in Ruby. It seems like having color_count and the color sums as separate variables is causing some pain, so we could change those variables to be a single array of colors and calculate the mean later. Iteratively building a collection is an anti-pattern, but it does shine a light on a direction we can follow. Functional programming teaches us to think in terms of data transformation. Each function takes data and returns it in a new form. We can compose several functions together and form a pipeline. Let’s walk our code and try to convert it into a pipeline. We can keep the imports and helper functions, so let’s skip to this part: # Fetch and parse the language list YAML langs_yaml = fetch_url(GITHUB_LANGS_URL) langs = parse_yaml(langs_yaml) In a functional programming language like Elixir, this could be written as: GITHUB_LANGS_URL |> fetch_url() |> parse_yaml() We hit our first roadblock: we have no pipe operator in Ruby! It is a common feature of functional programming languages that passes the result of an expression as a parameter of another expression. For a while, Ruby had a pipeline operator, but it was removed since the way it worked caused some controversy. So how can we do this in Ruby? We could write it as parse_yaml(fetch_url(GITHUB_LANGS_URL)), but keeping this pattern leads to quite unreadable code. Ruby is an object-oriented language, so we have to think in terms of objects and messages (i.e., methods). We need something that passes the caller to a given function, or, in other words, that yields self to a block. Luckily, Ruby has a method that does exactly that: yield_self, or its nicer-sounding alias then. Here’s how that code would look: GITHUB_LANGS_URL .then { |url| fetch_url(url) } .then { |languages_yaml| parse_yaml(languages_yaml) } Using Ruby’s numbered parameters, we can avoid having to name the block arguments: GITHUB_LANGS_URL .then { fetch_url _1 } .then { parse_yaml _1 } Cool, that is pretty close to the Elixir code. Now, we have to transform that big each block into a pipeline. In essence, that part of the code filters out non-programming languages and languages without color then calculates the average color. Let’s split those two parts into separate steps. parse_yaml returns a hash, so we can use Enumerable#filter to select the languages we want. # ... .then { parse_yaml _1 } .filter { |_lang_name, lang_details| lang_details["type"] == "programming" && lang_details["color"] } Then, we get the colors of each language and convert them to RGB: # ... .filter { |_lang_name, lang_details| lang_details["type"] == "programming" && lang_details["color"] } .map { |_lang_name, lang_details| hex_color_to_rgb(details["color"]) } This code works, but alas, it iterates over the languages twice (first time on filter and the other on map). We could use Enumerable#reduce to do this in a single pass, but that would be a bit lengthy (and many folks don’t know Enumerable#reduce). Again, Ruby has our back and provides a Enumerable#filter_map. It calls the given block on each element of the enumerable and returns an array containing the truthy elements returned by the block. We can merge those two steps into one: .filter_map { |_lang_name, lang_details| next if lang_details["type"] != "programming" next if lang_details["color"].nil? hex_color_to_rgb(details["color"]) } I split the filter condition into two steps because I think it’s easier to read. Also note that the if conditions are now inverted. Now we have an array of colors, with each color as an array of red, green, and blue values. We need to sum all red values together, then all green values, and all blue values. Let’s reshape our data representation to group values by color channel, so this will be easier: .filter_map { # ... } .transpose The pipeline is coming together, but we still have work to do. Calculating the average color now is fairly simple using Enumerable#sum (can we get Enumerable#mean, tho? 😅): .transpose .map { |channel_values| squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size Math.sqrt(squared_average).to_i } Readability, performance and balance Those with sharp eyes will notice that we’re still iterating over the values multiple times (sum, size, plus the call to filter_map and transpose). Again, using Enumerable#reduce would be an option for a single pass solution, but a O(n) solution isn’t a hard requirement for this exercise. Also, the body of that reduce call could be hard to grasp, so I decided to sacrifice a bit of performance to ease reading/teaching. As developers, we constantly have to balance readability, performance, and maintainability. Lastly, we convert the color, represented as a 3-element array, to a hex string and print it. Here’s the full solution: require "net/http" require "yaml" def fetch_url(url) Net::HTTP.get(URI(url)) end def parse_yaml(yaml_string) YAML.safe_load(yaml_string) end def hex_color_to_rgb(color) color.delete("#").scan(/../).map(&:hex) end def rgb_color_to_hex(color) color.map { |channel| channel.to_s(16).rjust(2, "0") }.join end GITHUB_LANGS_URL = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml" GITHUB_LANGS_URL .then { fetch_url _1 } .then { parse_yaml _1 } .filter_map { |_lang_name, lang_details| next if lang_details["type"] != "programming" next if lang_details["color"].nil? hex_color_to_rgb(lang_details["color"]) } .transpose .map { |channel_values| squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size Math.sqrt(squared_average).to_i } .then { |average_color| puts "##{rgb_color_to_hex(average_color)}" } One of the neat things about that pipeline is that we can extract any part of it into a separate method, and it still will be chainable. Ruby is a OOP language, so thinking about objects and methods is the natural way of programming. Whenever you can, use methods (like those on the Enumerable module), or create objects that provide the ones you need. Ruby also has good support for functional programming, and we can take advantage of that, particularly when doing data transformation. Mixing OOP and FP is not a sin, and Ruby has great features to support it. Moreover, remember that it’s okay to start with a simple solution and improve it later. That’s the natural flow when doing TDD. What? Oh, the color! Here it is: If you enjoyed this post, you might also like:

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note


Saved!

Annotations

Export as Markdown
+ Annotate selection

Add Annotation