Posts Tagged ‘Ruby’

Ruby TMTOWTDI, Episode 1

Monday, June 2nd, 2008

This is the inaugural post in a series that we’ll be calling Ruby TMTOWTDI (There’s more than one way to do it). Usually the TMTOWTDI acronym is used as a disparaging term towards flexible languages, which offer a myriad ways of solving most given problems. This series, however, aims to use this property of Ruby as an educational tool.

The format is as follows: we’ll present a problem, then everyone will try and write the most idiomatic, concise code that solves the problem in a readable fashion. This is not meant to be a golfing competition, nor an optimization one. However, we will benchmark the solutions for a rough idea on how different styles affect performance. For every installment we’ll have a performance winner for fastest time and a ’style’ winner for the most elegant code.

This gives us at CitrusByte a better understanding of each other’s coding styles, and stimulates discussions about little tricks we may not be aware of. Best of all, it’s fun as heck to be a part of.

You can get involved too! Leave a comment with your solution if you think you can best the ones listed here. We also encourage you to propose a simple problem (less than 20 lines of code to solve) for the next instalment in the series.

Since this is the first time we’re doing it, we’ve picked a very simple problem to solve as a quick warm-up.

Problem: given a string ’str’ of the format

"a10 b20 c25 d40"

convert it to a hash

{'a' => 10, 'b' => 20, 'c' => 25, 'd' => 40 }

Note that the values of this hash are integers, not strings. If the string is

"a10 a20"

the output should be

{'a' => 20}

.

Leaving some spoiler space in case you want to try it on your own. Below are selected solutions from our team.


Spoiler space


Solutions

Michel

Hash[*str.scan(/(.)(\d{2})/).map { |k, v| [k, v.to_i] }.flatten]

Time taken: 9.99s

Tim

Hash[*(str.scan(/(\w)(\d+)/).map {|k, v| [k, v.to_i] }.flatten)]

Time taken: 9.40s

Tim’s comments: the splat operator has low precedence so the outermost set of
parantheses are redundant, but I felt it’s better to have them there to make it
very obvious what’s going on.

Nicolas (winner: speed award)

str.scan(/\s*(\D+)(\d+)/).inject({}) {|h,(k,v)| h[k] = v.to_i; h }

Time taken: 7.19s

Nicolas’s comments: I like inject/reduce/fold, whatever you call it. It’s
concise and usually a perfect fit for constructing enumerable data structures.

Martin (winner: style award)

str.split.inject({}) {|h,i| h[i[/^\D+/]] = i[/\d+$/].to_i; h }

Time taken: 8.75s

Ari

hash = str.split.inject({}) do |set, h|
  set.merge( {h[0].chr => h[1..-1].to_i} )
end

Time taken: 23.02s

Ari’s comments: I knew everyone else would be strutting their one-liners, so I went for as much clarity as possible. I feel this code not only solves the problem, but also makes it very apparent what the desired solution is.

Benchmark code

require 'benchmark'
 
ITERATION = 400
 
def measure(&block)
  ITERATION.times { block.call }
end
 
str = ''
5000.times { str << ('a'..'z').to_a[rand(26)] + (rand(90)+10).to_s + ' ' }
br = Benchmark.bmbm do |b|
  b.report("Running #{ITERATION} times on string of length #{str.length}") do
    measure {
      #### Code to measure goes here
    }
  end
end

Conclusion

Lots of interesting styles here. We have two ways of creating hashes: with the Hash[ [key =>|, value]* method and with inject().

We’ve also seen various ways of breaking up a string: with scan(), with split(), and with regex arguments to a string’s [] method. While we cannot come to definite conclusions about performance (since the code that is being compared doesn’t differ just in one area), we can see a couple things going on:

  • running Hash[] on an array as opposed to creating it directly via inject hurt Michel and Tim’s performances compared to Martin/Nicolas
  • merge() seems to be significantly more costly than straight addition/replacement of hash keys. If you’re ever injecting into a hash it may be wiser (if your problem allows) to follow the { hash[key] = value; hash } format instead of merging.

We’ve picked Martin as the style winner for the elegance of his code which fit of the string parsing inside the inject block as opposed to the scan-based solutions which did some of it outside in the scan. The h[i[/^\D+/]] = i[/\d+$/].to_i portion was the clearest signal of intent for the contents of the hash.

Meanwhile, Nicolas is the performance winner by a mile, coming in more than 20% faster than the nearest competitor.

Congratulations to Martin (style) and Nicolas (speed) for being the winners of the inaugural CitrusByte Ruby TMTOWTDI. Stay tuned for the next episode!

EDIT

Additional comments by Nicolas:

Anyway, after thinking about it, even if I like inject a lot, there’s one big gain for the Hash[*...] as Michel posted. It
transmits intent better, in a way.

You look at that and the entire code sequence is a Hash construction, plain and simple, even if you ignore the subexpression in the middle, you know the outcome. And if you look one character into the brackets, then you know it’s constructing a hash from some transformation of an existing enumerable. Bah, just something that responds to #to_a.

Instead, using inject, the code flows (at least in my head) nicer as you don’t have to parse subexpressions, but it’s not immediately obvious what’s it about.

Hm, actually, depends on what intent is better to show. I mean, the objective is something like “to transform a string in a certain format into a more traversable structure” or “to create a hash out of a string in a certain format?”

That (not so) subtle difference might make me decide about which way is better, but then the description of the problem is not descriptive enough. Good, that means that the problem lies in the exercise’s designer. I can sleep well :P

(Probably there’s a third approach that to a degree involves a recursive lambda and someone flying over a desktop yelling “stand back! I know hash construction”, but my ninjitsu is not good enough for that yet)

Effortlessly farm work to an EC2 instance without batting an eye

Wednesday, February 20th, 2008

Inline media uploading and processing in ruby

Today, we released a new gem to make integrating off-site processing easy. With this gem and utilizing Amazon’s EC2 cloud and S3 storage back-end, you can easily process media files in a non-blocking, threaded way and do it all inline with the rest of your application.

First download the gem and all it’s dependencies by typing:

sudo gem install processor_pool -y

This gem is built on top of Sinatra. So, if you are familiar with the Sinatra syntax, then you’ll be right at home. If not, I highly suggest you check it out.

First,

# In a file called media_processor.rb
require "rubygems"
require "processor_pool"
 
# Set the access_key_id and the secret_access_key provided by amazon
access_key_id, secret_access_key = 'accesskey', 'somethinglongandfuzzyhere'
 
# And start the processors!
ProcessorPool.start(access_key_id, secret_access_key)

Add to the end of the file

get '/' do
  "Show me the money"
end

That’s it. Start your server up with

ruby media_processor.rb

And now you are ready to go to allow a user to hit the EC2 cloud (more on this later) and be responded with the “Show me the money” text.

Replace the above content with the following

class PFile
  attr_accessor :filename, :temp_file
  @@temp_directory_base = "tmp"
  def initialize(f)
    self.filename = f[:filename]
    self.temp_file = f[:tempfile]
    @media_file = nil
    @temp_dir
  end
  def delete!
    #File.unlink(self.temp_file)
    self.temp_file.delete
  end
  def store_in_temp_directory
    @temp_dir = File.join(@@temp_directory_base, @upload.member_id, @upload.media_file_id)
    FileUtils.mkdir_p(@temp_dir)
    filename = File.basename(self.filename)
    FileUtils.mv(self.temp_file.path, File.join(@temp_dir, filename))
    @media_file = MediaFile.new(File.join(@temp_dir, filename))
  end
  def process!
    ImageConverter.new(@media_file).convert
    @media_file.delete!
    "Processed images for #{@media_file.original_file_name}"
  end
  def save_to_storage
    # DO THE S3 saving here
    MediaDirectory.new(@temp_dir).copy_to_s3(STORAGE_BUCKET, upload_prefix, true)
  end
  def upload_prefix
    "#{@upload.member_id}/#{@upload.media_file_id}"
  end
end
class ImageConverter
  def resize_to(name, image = nil)
    img = image || Magick::Image.read(self.file.file_name).first
    named_size = self.class.named_sizes[name]
    if named_size =~ /c$/
      dimensions = named_size.gsub(/c$/, '').split('x')
      img.crop_resized!(dimensions[0].to_i, dimensions[1].to_i)
    else
      img.change_geometry!(named_size) do |cols, rows, img|
        img = img.resize(cols, rows)
      end
    end
    img.write(path_to(name))
  end
end
post '/new' do
  @file = PFile.new(params[:file])
  @file.store_in_temp_directory
  @file.save_to_storage
  redirect params[:success_url] if params[:success_url]
end

If you look closely, we are just using some simple classes to handle our conversion. But ignoring that, we are sending a post to the server with file parameters. We can send the process a file and the rest is history. Notice that this all happens on a remote server. None of this is handled locally.

That’s it.

Of course, you can imagine how you could extend this to include movie types and other types of processing, but for the time being, let’s focus the images. (Although I will be showing how to do this in rails, you can use this with any back-end, PHP, Java, etc.)

Let’s add a convenience method in the application.rb

def url_for_upload_server
  if RAILS_ENV == 'production'
    # determine the upload server to try to use
    p = Processors.get_random_processor(::SERVER_POOL_BUCKET)
    "http://#{p.hostname}:4567/new"
  else
    p = Processors.get_random_processor(::SERVER_POOL_BUCKET)
    "http://localhost:4567/new"
  end
end

In your upload controller add

@upload_url = url_for_upload_server

Now, in your upload view, add

<% form_for @media, :id => "fileform", :url => upload_url, :multipart => true do %>
...
<% end %>

Lets handle this inline and non-blocking

Add a button field to the end of your form like so:

<button class="button" id="buttonUpload"  onclick="return beginFileUpload();">Upload</button>

and this to your javascript file (note, this is jQuery)

function beginFileUpload()
{
    // todo: some kind of feedback that upload has started...
    $("#uploading").show();
    $("#buttonUpload").hide();
 
    $("#validation_code").load('<%= url_for(:controller=>"media_file", :action=>"upload") %>',
          {    title: $("#title").val(),
               description: $("#description").val()
          },
          function(responseText, textStatus, XMLHttpRequest) {
              $("#validation_code").val(responseText)
              if (textStatus == 'success')
                  completeFileUpload();
              else
                  alert('Error uploading file.');
              });
    return false;
}
 
function completeFileUpload()
{
    $("#fileform").submit();
    return false;
}

With that, you have inline file uploading with inline image processing off-site.

For more information, check the docs at http://rubyforge.org/projects/processorpool/