hash = { 'mark' => 1, 'jane' => 1, 'peter' => 35 }.select {|k,v| v > 1}
#=> { 'peter' => 35 }
如果我有数百万个密钥怎么办?
之间有区别 hash = hash.select
vs hash.select!
?
答案 0 :(得分:7)
select!
表现更好(我会显示MRI的来源,但其他人应该相同)。
原因是select
需要create a whole new Hash
object,for each entry in the hash,copy the entry - if the block succeeds。
另一方面,select!
,for each key,remove the value - if the block doesn't succeed - in-place(无需创建新对象)。< / p>
答案 1 :(得分:6)
你可以随时做一些基准测试:
require 'benchmark'
# Creates a big hash in the format: { 1 => 1, 2 => 2 ... }
big_hash = 100_000.times.inject({}) { |hash, i| hash.tap { |h| h[i] = i } }
Benchmark.bm do |bm|
bm.report('select') { big_hash.select{ |k,v| v > 50 } }
bm.report('select!') { big_hash.select!{ |k,v| v > 50 } }
end
user system total real
select 0.080000 0.000000 0.080000 ( 0.088048)
select! 0.020000 0.000000 0.020000 ( 0.021324)
答案 2 :(得分:3)
绝对是的。 select!
就位,可以减少GC扫描次数,减少内存消耗。作为概念证明:
这是./wrapper.rb:
require 'json'
require 'benchmark'
def measure(&block)
no_gc = ARGV[0] == '--no-gc'
no_gc ? GC.disable : GC.start
memory_before = `ps -o rss= -p #{Process.pid}`.to_f #/ 1024
gc_stat_before = GC.stat
time = Benchmark.realtime do
yield
end
puts ObjectSpace.count_objects
if !no_gc
puts " Sweeping"
GC.start(full_mark: true, immediate_sweep: true, immediate_mark: false)
end
puts ObjectSpace.count_objects
gc_stat_after = GC.stat
memory_after = `ps -o rss= -p #{Process.pid}`.to_f # / 1024
puts({
RUBY_VERSION => {
gc: no_gc ? 'disabled': 'enabled',
time: time.round(2),
gc_count: gc_stat_after[:count] - gc_stat_before[:count],
memory: "%d MB" % (memory_after - memory_before)
}
}.to_json)
puts "---------\n"
end
这是./so_question.rb:
require_relative './wrapper'
data = Array.new(100) { ["x","y"].sample * 1024 * 1024 }
measure do
data.select! { |x| x.start_with?("x") }
end
measure do
data = data.select { |x| x.start_with?("x") }
end
运行它:
ruby so_question.rb --no-gc
结果:
{:TOTAL=>30160, :FREE=>21134, :T_OBJECT=>160, :T_CLASS=>557,
:T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>5884, :T_REGEXP=>75,
:T_ARRAY=>710, :T_HASH=>35, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>896, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}
{:TOTAL=>30160, :FREE=>21067, :T_OBJECT=>160, :T_CLASS=>557,
:T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>5947, :T_REGEXP=>75,
:T_ARRAY=>710, :T_HASH=>38, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>897, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}
{"2.2.2":{"gc":"disabled","time":0.0,"gc_count":0,"memory":"20 MB"}}
{:TOTAL=>30160, :FREE=>20922, :T_OBJECT=>162, :T_CLASS=>557, :T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>6072, :T_REGEXP=>75,
:T_ARRAY=>717, :T_HASH=>45, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>901, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}
{:TOTAL=>30160, :FREE=>20885, :T_OBJECT=>162, :T_CLASS=>557,
:T_MODULE=>38, :T_FLOAT=>7, :T_STRING=>6108, :T_REGEXP=>75,
:T_ARRAY=>717, :T_HASH=>46, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>3,
:T_DATA=>901, :T_COMPLEX=>1, :T_NODE=>618, :T_ICLASS=>38}
{"2.2.2":{"gc":"disabled","time":0.0,"gc_count":0,"memory":"0 MB"}}
注意内存差异。另外,我使用Array而不是Hash创建了这个示例,但两者的行为方式相同,因为#select
是一个枚举器。