Ruby TMTOWTDI, Episode 1
Monday, June 2nd, 2008This is the inaugural post in a series that we’ll be calling Ruby TMTOWTDI (There’s more than one way to do it). Usually the TMTOWTDI acronym is used as a disparaging term towards flexible languages, which offer a myriad ways of solving most given problems. This series, however, aims to use this property of Ruby as an educational tool.
The format is as follows: we’ll present a problem, then everyone will try and write the most idiomatic, concise code that solves the problem in a readable fashion. This is not meant to be a golfing competition, nor an optimization one. However, we will benchmark the solutions for a rough idea on how different styles affect performance. For every installment we’ll have a performance winner for fastest time and a ’style’ winner for the most elegant code.
This gives us at CitrusByte a better understanding of each other’s coding styles, and stimulates discussions about little tricks we may not be aware of. Best of all, it’s fun as heck to be a part of.
You can get involved too! Leave a comment with your solution if you think you can best the ones listed here. We also encourage you to propose a simple problem (less than 20 lines of code to solve) for the next instalment in the series.
Since this is the first time we’re doing it, we’ve picked a very simple problem to solve as a quick warm-up.
Problem: given a string ’str’ of the format
"a10 b20 c25 d40"convert it to a hash
{'a' => 10, 'b' => 20, 'c' => 25, 'd' => 40 }
Note that the values of this hash are integers, not strings. If the string is
"a10 a20"the output should be
{'a' => 20}
.
Leaving some spoiler space in case you want to try it on your own. Below are selected solutions from our team.
Spoiler space
Solutions
Michel
Hash[*str.scan(/(.)(\d{2})/).map { |k, v| [k, v.to_i] }.flatten]
Time taken: 9.99s
Tim
Hash[*(str.scan(/(\w)(\d+)/).map {|k, v| [k, v.to_i] }.flatten)]
Time taken: 9.40s
Tim’s comments: the splat operator has low precedence so the outermost set of
parantheses are redundant, but I felt it’s better to have them there to make it
very obvious what’s going on.
Nicolas (winner: speed award)
str.scan(/\s*(\D+)(\d+)/).inject({}) {|h,(k,v)| h[k] = v.to_i; h }
Time taken: 7.19s
Nicolas’s comments: I like inject/reduce/fold, whatever you call it. It’s
concise and usually a perfect fit for constructing enumerable data structures.
Martin (winner: style award)
str.split.inject({}) {|h,i| h[i[/^\D+/]] = i[/\d+$/].to_i; h }
Time taken: 8.75s
Ari
hash = str.split.inject({}) do |set, h| set.merge( {h[0].chr => h[1..-1].to_i} ) end
Time taken: 23.02s
Ari’s comments: I knew everyone else would be strutting their one-liners, so I went for as much clarity as possible. I feel this code not only solves the problem, but also makes it very apparent what the desired solution is.
Benchmark code
require 'benchmark' ITERATION = 400 def measure(&block) ITERATION.times { block.call } end str = '' 5000.times { str << ('a'..'z').to_a[rand(26)] + (rand(90)+10).to_s + ' ' } br = Benchmark.bmbm do |b| b.report("Running #{ITERATION} times on string of length #{str.length}") do measure { #### Code to measure goes here } end end
Conclusion
Lots of interesting styles here. We have two ways of creating hashes: with the Hash[ [key =>|, value]* method and with inject().
We’ve also seen various ways of breaking up a string: with scan(), with split(), and with regex arguments to a string’s [] method. While we cannot come to definite conclusions about performance (since the code that is being compared doesn’t differ just in one area), we can see a couple things going on:
- running Hash[] on an array as opposed to creating it directly via inject hurt Michel and Tim’s performances compared to Martin/Nicolas
- merge() seems to be significantly more costly than straight addition/replacement of hash keys. If you’re ever injecting into a hash it may be wiser (if your problem allows) to follow the
{ hash[key] = value; hash }format instead of merging.
We’ve picked Martin as the style winner for the elegance of his code which fit of the string parsing inside the inject block as opposed to the scan-based solutions which did some of it outside in the scan. The h[i[/^\D+/]] = i[/\d+$/].to_i portion was the clearest signal of intent for the contents of the hash.
Meanwhile, Nicolas is the performance winner by a mile, coming in more than 20% faster than the nearest competitor.
Congratulations to Martin (style) and Nicolas (speed) for being the winners of the inaugural CitrusByte Ruby TMTOWTDI. Stay tuned for the next episode!
EDIT
Additional comments by Nicolas:
Anyway, after thinking about it, even if I like inject a lot, there’s one big gain for the Hash[*...] as Michel posted. It
transmits intent better, in a way.
You look at that and the entire code sequence is a Hash construction, plain and simple, even if you ignore the subexpression in the middle, you know the outcome. And if you look one character into the brackets, then you know it’s constructing a hash from some transformation of an existing enumerable. Bah, just something that responds to #to_a.
Instead, using inject, the code flows (at least in my head) nicer as you don’t have to parse subexpressions, but it’s not immediately obvious what’s it about.
Hm, actually, depends on what intent is better to show. I mean, the objective is something like “to transform a string in a certain format into a more traversable structure” or “to create a hash out of a string in a certain format?”
That (not so) subtle difference might make me decide about which way is better, but then the description of the problem is not descriptive enough. Good, that means that the problem lies in the exercise’s designer. I can sleep well :P
(Probably there’s a third approach that to a degree involves a recursive lambda and someone flying over a desktop yelling “stand back! I know hash construction”, but my ninjitsu is not good enough for that yet)
