Monday, February 20, 2012

Ruby Blocks and Procs

There are two main ways to create a method that takes and executes a block in Ruby. The first is using yield:
$ irb
>> def compute_with_yield
>> yield
>> end
=> nil
>> compute_with_yield{1+1}
=> 2
Note how the method compute_with_yield takes no parameters, and the block is executed simply invoking yield. Yield grabs the block attached to the method it is called from, and executes it. There is no indication in the signature of compute_with_yield that a block is expected. I find that to be an issue, but I guess that's something that documentation can solve. What bothers me is that I'd like to know how to use compute_with_yield just looking at the signature. Documentation should just be necessary for the details on what the method does and how it uses the parameters, and should not be necessary to know how to call the method. This to me is a defect in the Ruby spec.
That said, the second way is to pass a block passed as a parameter. This is useful when you want to write a method that passes the attached block as a parameter to a second method. It also makes it clear from the signature that the method expects a block:
>> def compute_with_block_call(&block)
>> block.call
>> end
=> nil
>>
?> def compute_passing_a_block_to_another_method(&block)
>> compute_with_block_call(&block)
>> end
=> nil
>> compute_passing_a_block_to_another_method{1+1}
=> 2
You can also write an equivalent of compute_with_block_call in this way:
?> def compute_using_proc_new
>> compute_with_block_call(&Proc.new)
>> end
=> nil
>> compute_using_proc_new{1+1}
=> 2
Huh? What's going on here?

This is using a little known (but well documented) propertyof Proc.new. When invoked with no block, it acquires the block attached to the method it is called from. Or, as better stated in the documentation: "Proc.new Creates a new Proc object, bound to the current context. Proc::new may be called without a block only within a method with an attached block, in which case that block is converted to the Proc object."

So should you be using yield or block.call? It seems that passing a block as a parameter and then having the ability to either pass it to another method or to call block.call gives more options. Also it seems to make the API clearer, because it defines the expectation of a block in the signature. So why bother with yield at all?
The reason you should always use yield without specifying &block as a parameter is that when you pass a block as a parameter you are implicitly creating a Proc object, which is an amazingly slow operation.
A simple demonstration of this fact can be shown with this simple example:

def compute_with_yield  
    yield
end

def compute_with_block_call(&block)  
  block.call
end

def compute_passing_a_block_to_another_method(&block)  
  compute_with_block_call(&block)
end

def compute_passing_a_block_and_ignoring_it(&block)
end

def compute_using_proc_new  
  compute_with_block_call(&Proc.new)
end

require 'benchmark'

n=1000000
Benchmark.bmbm do |x|  
  x.report("compute_with_yield") do    
    n.times {compute_with_yield { 1+1 }}  
  end  
  x.report("compute_with_block_call") do    
    n.times {compute_with_block_call { 1+1 }}  
  end  
  x.report("compute_passing_a_block_to_another_method") do    
    n.times {compute_passing_a_block_to_another_method { 1+1 }}  
  end  
  x.report("compute_passing_a_block_and_ignoring_it") do    
    n.times {compute_passing_a_block_and_ignoring_it { 1+1 }}  
  end  
  x.report("compute_using_proc_new") do    
    n.times {compute_using_proc_new { 1+1 }}  
  end
end

This script produces the following output:
user system total real compute_with_yield 0.520000 0.000000 0.520000 ( 0.519002) compute_with_block_call 2.940000 0.080000 3.020000 ( 3.030361) compute_passing_a_block_to_another_method 3.040000 0.000000 3.040000 ( 3.036327) compute_passing_a_block_and_ignoring_it 1.970000 0.140000 2.110000 ( 2.112733) compute_using_proc_new 3.180000 0.140000 3.320000 ( 3.316456)
As you can see compute_with_yield is way faster than all the other variations.

Passing a block as parameter and using block.call to executes takes about 5 times the time of compute_with_yield. That is a huge overhead!

Passing a block to another method that then executes it adds no much overhead. That is because the initial creation of Proc dwarfs the time needed to make the intermediate method call.

Even just passing the block and ignoring it is way slower than the version with yield, where the block is actually executed!

Using the Proc.new "trick" doesn't seem to add any substantial overhead.

1 comment:

  1. I am actually in need for this particular info. Good thing I had the chance to visit this blog, you really made this blog a good source of learning. I'll be checking out for more updates. thanks a lot and a job well done for you!

    ReplyDelete