Excerpt
GitHub has a library to syntax highlight code snippets called Linguist. It has
an extensive list of languages and their characteristics (e.g., name, file
extensions, color, etc.). I had some free time, so I came up with this silly
question: what would be the average programming language color?
I’m talking about the color used in the “Languages” section of a repository:
We can get this done with a few lines of Ruby:
require "net/http"
require "yaml"
# Some helper functions
def fetch_url(url)
Net::HTTP.get(URI(url))
end
def parse_yaml(yaml_string)
YAML.safe_load(yaml_string)
end
def hex_color_to_rgb(color)
color.delete("#").scan(/../).map(&:hex)
end
def rgb_color_to_hex(color)
color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end
GITHUB_LANGS = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"
# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS)
langs = parse_yaml(langs_yam
GitHub has a library to syntax highlight code snippets called Linguist. It has
an extensive list of languages and their characteristics (e.g., name, file
extensions, color, etc.). I had some free time, so I came up with this silly
question: what would be the average programming language color?
I’m talking about the color used in the “Languages” section of a repository:
We can get this done with a few lines of Ruby:
require "net/http"
require "yaml"
# Some helper functions
def fetch_url(url)
Net::HTTP.get(URI(url))
end
def parse_yaml(yaml_string)
YAML.safe_load(yaml_string)
end
def hex_color_to_rgb(color)
color.delete("#").scan(/../).map(&:hex)
end
def rgb_color_to_hex(color)
color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end
GITHUB_LANGS = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"
# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS)
langs = parse_yaml(langs_yaml)
# Calculate the average color of the programming languages
red_sum = 0
green_sum = 0
blue_sum = 0
color_count = 0
langs.each do |_name, details|
next if details["type"] != "programming" # Skip "non-programming" languages
next if details["color"].nil? # Skip languages without a color
rgb = hex_color_to_rgb(details["color"])
red_sum += rgb[0] ** 2
green_sum += rgb[1] ** 2
blue_sum += rgb[2] ** 2
color_count += 1
end
average_red = Math.sqrt(red_sum / color_count).to_i
average_green = Math.sqrt(green_sum / color_count).to_i
average_blue = Math.sqrt(blue_sum / color_count).to_i
average_color = "##{rgb_color_to_hex([average_red, average_green, average_blue])}"
puts average_color
While writing this article, I discovered at least two different approaches to averaging colors:
Sum individual color channels together and then divide it by the number of colors.
Sum the square of each color
channel, divide by the number of colors and take the square root of it.
I used the second method here because the result color looks nicer. For simplicity, I also I left
out details, like error handling.
There you go! It prints out the average color. We can close our laptops and call it a day. There are
few reasons to improve code that we won’t touch again.
Well, for a throwaway project, that solution is indeed enough. Sometimes we have fun coding, and
that’s it, no worries. In this particular case, though, I thought it would be an excellent
exercise to practice functional programming. So, I decided to refactor it.
Some things bothered me in the original code. That each block does a lot of stuff, and, as
generally happens with things with lots of responsibilities, it doesn’t do them well:
The logic to calculate the average color is split into different parts of the
code. Some of it is inside the each block, and some of it is outside;
The code is fragile:
Updating color_count has to happen after the next if ... calls;
It’s easy to miss why color_count is necessary at all and, instead, use langs.size to
calculate the average color, which would give us the wrong result.
The code is very procedural, and it feels weird in Ruby.
It seems like having color_count and the color sums as separate variables is causing some pain, so
we could change those variables to be a single array of colors and calculate the mean later.
Iteratively building a collection is an anti-pattern, but it does shine a light on a direction we
can follow.
Functional programming teaches us to think in terms of data transformation. Each function takes data
and returns it in a new form. We can compose several functions together and form a pipeline.
Let’s walk our code and try to convert it into a pipeline. We can keep the
imports and helper functions, so let’s skip to this part:
# Fetch and parse the language list YAML
langs_yaml = fetch_url(GITHUB_LANGS_URL)
langs = parse_yaml(langs_yaml)
In a functional programming language like Elixir, this could be written as:
GITHUB_LANGS_URL
|> fetch_url()
|> parse_yaml()
We hit our first roadblock: we have no pipe operator in Ruby! It is a common
feature of functional programming languages that passes the result of an
expression as a parameter of another expression.
For a while, Ruby had a pipeline operator, but it was removed since the way it
worked caused some controversy.
So how can we do this in Ruby? We could write it as parse_yaml(fetch_url(GITHUB_LANGS_URL)), but
keeping this pattern leads to quite unreadable code. Ruby is an object-oriented language, so we have
to think in terms of objects and messages (i.e., methods).
We need something that passes the caller to a given function, or, in other words, that yields self
to a block. Luckily, Ruby has a method that does exactly that: yield_self, or its
nicer-sounding alias then. Here’s how that code would look:
GITHUB_LANGS_URL
.then { |url| fetch_url(url) }
.then { |languages_yaml| parse_yaml(languages_yaml) }
Using Ruby’s numbered parameters, we can avoid having to name the block arguments:
GITHUB_LANGS_URL
.then { fetch_url _1 }
.then { parse_yaml _1 }
Cool, that is pretty close to the Elixir code. Now, we have to transform that big each block into a
pipeline. In essence, that part of the code filters out non-programming languages and languages
without color then calculates the average color. Let’s split those two parts into separate steps. parse_yaml returns a hash, so we can use Enumerable#filter to select the languages we want.
# ...
.then { parse_yaml _1 }
.filter { |_lang_name, lang_details|
lang_details["type"] == "programming" && lang_details["color"]
}
Then, we get the colors of each language and convert them to RGB:
# ...
.filter { |_lang_name, lang_details|
lang_details["type"] == "programming" && lang_details["color"]
}
.map { |_lang_name, lang_details|
hex_color_to_rgb(details["color"])
}
This code works, but alas, it iterates over the languages twice (first time on filter and the
other on map). We could use Enumerable#reduce to do this in a single pass, but that would be a
bit lengthy (and many folks don’t know Enumerable#reduce). Again, Ruby has our back and provides a
Enumerable#filter_map. It calls the given block on each element of the enumerable and returns an
array containing the truthy elements returned by the block. We can merge those two steps into one:
.filter_map { |_lang_name, lang_details|
next if lang_details["type"] != "programming"
next if lang_details["color"].nil?
hex_color_to_rgb(details["color"])
}
I split the filter condition into two steps because I think it’s easier to read. Also note that the
if conditions are now inverted.
Now we have an array of colors, with each color as an array of red, green, and blue values. We need
to sum all red values together, then all green values, and all blue values. Let’s reshape our data
representation to group values by color channel, so this will be easier:
.filter_map {
# ...
}
.transpose
The pipeline is coming together, but we still have work to do. Calculating the average color now is
fairly simple using Enumerable#sum (can we get Enumerable#mean, tho? 😅):
.transpose
.map { |channel_values|
squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size
Math.sqrt(squared_average).to_i
}
Readability, performance and balance
Those with sharp eyes will notice that we’re still iterating over the values multiple times
(sum, size, plus the call to filter_map and
transpose). Again, using Enumerable#reduce would be an option for a
single pass solution, but a O(n) solution isn’t a hard requirement for this exercise.
Also, the body of that reduce call could be hard to grasp, so I decided to sacrifice
a bit of performance to ease reading/teaching. As developers, we constantly have to balance
readability, performance, and maintainability.
Lastly, we convert the color, represented as a 3-element array, to a hex string and print it. Here’s the full solution:
require "net/http"
require "yaml"
def fetch_url(url)
Net::HTTP.get(URI(url))
end
def parse_yaml(yaml_string)
YAML.safe_load(yaml_string)
end
def hex_color_to_rgb(color)
color.delete("#").scan(/../).map(&:hex)
end
def rgb_color_to_hex(color)
color.map { |channel| channel.to_s(16).rjust(2, "0") }.join
end
GITHUB_LANGS_URL = "https://raw.githubusercontent.com/github/linguist/6b02d3bd769d07d1fdc661ba9e37aad0fd70e2ff/lib/linguist/languages.yml"
GITHUB_LANGS_URL
.then { fetch_url _1 }
.then { parse_yaml _1 }
.filter_map { |_lang_name, lang_details|
next if lang_details["type"] != "programming"
next if lang_details["color"].nil?
hex_color_to_rgb(lang_details["color"])
}
.transpose
.map { |channel_values|
squared_average = channel_values.sum { |value| value ** 2 } / channel_values.size
Math.sqrt(squared_average).to_i
}
.then { |average_color| puts "##{rgb_color_to_hex(average_color)}" }
One of the neat things about that pipeline is that we can extract any part of it into a separate
method, and it still will be chainable.
Ruby is a OOP language, so thinking about objects
and methods is the natural way of programming. Whenever you can, use methods (like those on the
Enumerable module), or create objects that provide the ones you need.
Ruby also has good support for functional programming, and we can take advantage of that,
particularly when doing data transformation. Mixing
OOP and FP
is not a sin, and Ruby has great features to support it.
Moreover, remember that it’s okay to start with a simple solution and improve it later. That’s the
natural flow when doing TDD.
What? Oh, the color! Here it is:
If you enjoyed this post, you might also like: