在Ruby哈希调用时,`select`和`select!`之间是否存在性能差异?

时间:2016-08-26 08:05:00

标签: ruby performance sorting hash

hash = { 'mark' => 1, 'jane' => 1, 'peter' => 35 }.select {|k,v| v > 1}
#=> { 'peter' => 35 }

如果我有数百万个密钥怎么办?

之间有区别

hash = hash.select vs hash.select!

3 个答案:

答案 0 :(得分:7)

select!表现更好(我会显示MRI的来源,但其他人应该相同)。

原因是select需要create a whole new Hash objectfor each entry in the hashcopy the entry - if the block succeeds

另一方面,select!for each keyremove the value - if the block doesn't succeed - in-place(无需创建新对象)。< / p>

答案 1 :(得分:6)

你可以随时做一些基准测试:

require 'benchmark'

# Creates a big hash in the format: { 1 => 1, 2 => 2 ... }
big_hash = 100_000.times.inject({}) { |hash, i| hash.tap { |h| h[i] = i }   }
Benchmark.bm do |bm|
   bm.report('select') { big_hash.select{ |k,v| v > 50 } }
   bm.report('select!') { big_hash.select!{ |k,v| v > 50 } }
end

         user       system     total     real
select   0.080000   0.000000   0.080000  (  0.088048)
select!  0.020000   0.000000   0.020000  (  0.021324)

答案 2 :(得分:3)

绝对是的。 select!就位,可以减少GC扫描次数,减少内存消耗。作为概念证明:

这是./wrapper.rb:

require 'json'
require 'benchmark'

def measure(&block)
  no_gc = ARGV[0] == '--no-gc'

  no_gc ? GC.disable : GC.start

  memory_before = `ps -o rss= -p #{Process.pid}`.to_f #/ 1024
  gc_stat_before = GC.stat
  time = Benchmark.realtime do
    yield
  end
  puts ObjectSpace.count_objects
  if !no_gc
    puts "  Sweeping"
    GC.start(full_mark: true, immediate_sweep: true, immediate_mark: false)
  end
  puts ObjectSpace.count_objects
  gc_stat_after = GC.stat
  memory_after = `ps -o rss= -p #{Process.pid}`.to_f # / 1024
  puts({
    RUBY_VERSION => {
      gc: no_gc ? 'disabled': 'enabled',
      time: time.round(2),
      gc_count: gc_stat_after[:count] - gc_stat_before[:count],
      memory: "%d MB" % (memory_after - memory_before)
    }
  }.to_json)
  puts "---------\n"
end

这是./so_question.rb:

require_relative './wrapper'
data = Array.new(100) { ["x","y"].sample * 1024 * 1024 }

measure do
  data.select! { |x| x.start_with?("x") }
end
measure do
  data = data.select { |x| x.start_with?("x") }
end

运行它:

ruby so_question.rb --no-gc

结果:

{:TOTAL=>30160, :FREE=>21134, :T_OBJECT=>160, :T_CLASS=>557,
:T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>5884, :T_REGEXP=>75,
:T_ARRAY=>710, :T_HASH=>35, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>896, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}   
{:TOTAL=>30160, :FREE=>21067, :T_OBJECT=>160, :T_CLASS=>557,
:T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>5947, :T_REGEXP=>75,
:T_ARRAY=>710, :T_HASH=>38, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>897, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}   
{"2.2.2":{"gc":"disabled","time":0.0,"gc_count":0,"memory":"20 MB"}}   

{:TOTAL=>30160, :FREE=>20922, :T_OBJECT=>162, :T_CLASS=>557, :T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>6072, :T_REGEXP=>75,
:T_ARRAY=>717, :T_HASH=>45, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>901, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}   
{:TOTAL=>30160, :FREE=>20885, :T_OBJECT=>162, :T_CLASS=>557,
:T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>6108, :T_REGEXP=>75,
:T_ARRAY=>717, :T_HASH=>46, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>901, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}    
{"2.2.2":{"gc":"disabled","time":0.0,"gc_count":0,"memory":"0 MB"}}   

注意内存差异。另外,我使用Array而不是Hash创建了这个示例,但两者的行为方式相同,因为#select是一个枚举器。