排序哈希的最快方法是什么?

时间:2015-06-29 19:03:02

标签: ruby

人们经常会问什么是最佳方式来对哈希进行排序,但是他们并没有询问所需的后续问题是什么是最快的方式,这确实是最好的方式。

无论使用何种Ruby版本,在Ruby中对哈希进行排序的最快方法是什么?

我正在寻找能够涵盖极端情况的其他答案,或者发现更通用和/或最快的方法的问题。

3 个答案:

答案 0 :(得分:6)

排序哈希的最快方法是什么?

require 'fruity'

HASH = Hash[('a'..'z').to_a.shuffle.map{ |k| [k, 1] }]

def sort_hash1(h)
  h.sort.to_h
end

def sort_hash2(h)
  Hash[h.sort]
end

def sort_hash3(h)
  Hash[h.sort_by{ |k, v| k }]
end

def sort_keys(h)
  keys = h.keys.sort
  Hash[keys.zip(h.values_at(*keys))]
end

puts "Running on Ruby v#{ RUBY_VERSION }"
puts

compare do
  do_sort_hash1 { sort_hash1(HASH) } if [].respond_to?(:to_h)
  do_sort_hash2 { sort_hash2(HASH) }
  do_sort_hash3 { sort_hash3(HASH) }
  do_sort_keys { sort_keys(HASH) }
end

在Mac OS笔记本电脑上运行上述代码会产生以下输出:

# >> Running on Ruby v2.2.2
# >> 
# >> Running each test 256 times. Test will take about 1 second.
# >> do_sort_keys is faster than do_sort_hash3 by 39.99999999999999% ± 10.0%
# >> do_sort_hash3 is faster than do_sort_hash1 by 1.9x ± 0.1
# >> do_sort_hash1 is similar to do_sort_hash2

# >> Running on Ruby v1.9.3
# >> 
# >> Running each test 256 times. Test will take about 1 second.
# >> do_sort_keys is faster than do_sort_hash3 by 19.999999999999996% ± 10.0%
# >> do_sort_hash3 is faster than do_sort_hash2 by 4x ± 0.1

加倍哈希值:

HASH = Hash[[*('a'..'z'), *('A'..'Z')].shuffle.map{ |k| [k, 1] }]

结果:

# >> Running on Ruby v2.2.2
# >> 
# >> Running each test 128 times. Test will take about 1 second.
# >> do_sort_keys is faster than do_sort_hash3 by 50.0% ± 10.0%
# >> do_sort_hash3 is faster than do_sort_hash1 by 2.2x ± 0.1
# >> do_sort_hash1 is similar to do_sort_hash2

# >> Running on Ruby v1.9.3
# >> 
# >> Running each test 128 times. Test will take about 1 second.
# >> do_sort_keys is faster than do_sort_hash3 by 30.000000000000004% ± 10.0%
# >> do_sort_hash3 is faster than do_sort_hash2 by 4x ± 0.1

值将根据硬件而变化,但相对结果不应更改。

为简单起见,选择

Fruity而不是使用内置的Benchmark类。

这是由“Sort hash by key, return hash in Ruby”提示的。

答案 1 :(得分:0)

以下是一些值得考虑的有趣事项:

require 'fruity'

puts "Running Ruby v#{ RUBY_VERSION }"
# >> Running Ruby v2.2.2

require 'fruity'

puts "Running Ruby v#{ RUBY_VERSION }"
# >> Running Ruby v2.2.2

这将使用整数作为键来查看差异:

HASH = Hash[[*(1..100)].shuffle.map{ |k| [k, 1] }]
compare do
  _sort1 { HASH.sort.to_h }
  _sort2 { HASH.sort{ |a, b| a[0] <=> b[0] }.to_h }
  _sort3 { HASH.sort{ |a, b| a.first <=> b.first }.to_h }
  _sort_by { HASH.sort_by{ |k,v| k }.to_h }
end
# >> Running each test 64 times. Test will take about 1 second.
# >> _sort_by is faster than _sort2 by 70.0% ± 1.0%
# >> _sort2 is faster than _sort3 by 19.999999999999996% ± 1.0%
# >> _sort3 is faster than _sort1 by 19.999999999999996% ± 1.0%

这将使用单字符字符串作为关键字来查看差异:

HASH = Hash[[*('a'..'Z')].shuffle.map{ |k| [k, 1] }]
compare do
  _sort1 { HASH.sort.to_h }
  _sort2 { HASH.sort{ |a, b| a[0] <=> b[0] }.to_h }
  _sort3 { HASH.sort{ |a, b| a.first <=> b.first }.to_h }
  _sort_by { HASH.sort_by{ |k,v| k }.to_h }
end
# >> Running each test 16384 times. Test will take about 1 second.
# >> _sort1 is similar to _sort3
# >> _sort3 is similar to _sort2
# >> _sort2 is faster than _sort_by by 1.9x ± 0.1

答案 2 :(得分:0)

这是访问更复杂对象时sort_byrequire 'fruity' RUBY_VERSION # => "2.2.2" class Foo attr_reader :key def initialize(k) @key = k end def <=>(b) self.key <=> b.key end end HASH = Hash[[*(1..100)].shuffle.map{ |k| [Foo.new(k), 1] }] compare do _sort1 { HASH.sort.to_h } _sort_by { HASH.sort_by{ |k,v| k.key }.to_h } end # >> Running each test 32 times. Test will take about 1 second. # >> _sort_by is faster than _sort1 by 2.7x ± 0.1 的比较:

STR1            DB      5 DUP('$')