Ruby Looking数组的哈希性能

时间:2015-05-13 03:07:46

标签: ruby-on-rails arrays ruby performance hash

目前我面临这个问题 例如,我有这个散列数组

data = [
  {:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
  {:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
  {:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]

所以我想在上面哈希的start_date和end_date的范围内找到具有“2015-01-04”的确切哈希

按照我发现的文件,有3种方法可以做到这一点

1)使用选择

finding_hash = data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}

finding_hash将返回所需哈希的数组 但是当我这样做时,我保证在执行此SELECT之后,条件总会只有一个哈希匹配 我必须finding_hash.first才能获得我想要的哈希值

2)使用find

finding_hash = data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}

这种方式,finds_hash是我需要的结果哈希

3)传统循环

data.each do |t|
  if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
    return t
    break
  end
end

那么哪一个是最快的方法。我确实需要性能,因为我的数据非常大!

谢谢你,抱歉我的英文不好!

4 个答案:

答案 0 :(得分:2)

您可以按benchmark

进行测试

例如:

require 'benchmark'

n = 1000000

data = [
  {:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
  {:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
  {:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]


Benchmark.bm do |x|

x.report { n.times do
   data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
   end
}

x.report { n.times do
 data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
  end

 }

x.report {
n.times do
   finding_hash = {}
   data.each do |t|
     if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
       finding_hash = t
       break
     end
    end
end
}

end

输出:

       user     system      total        real
   1.490000   0.020000   1.510000 (  1.533589)
   1.070000   0.010000   1.080000 (  1.096578)
   1.000000   0.010000   1.010000 (  1.011021)

测试结果与n的值和数据大小有关。

答案 1 :(得分:2)

您尝试过的所有方法都是Enumerable方法,但原生Array方法更快。试试find_index。即使在必须单独调用加载哈希之后,它仍然比下一个最快的速度快20%左右:

index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]

我的基准:

n = 1_000_000

data = [
  {:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
  {:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
  {:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]

Benchmark.bm do |x|
  x.report 'Enumerable#select' do
    n.times do
      data.select do |h|
        h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
      end
    end
  end

  x.report 'Enumerable#detect' do
    n.times do
      data.detect do |h|
        h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
      end
    end
  end

  x.report 'Enumerable#each  ' do
    n.times do
      finding_hash = {}
      data.each do |t|
        if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
          finding_hash = t
          break t
        end
      end
    end
  end

  x.report 'Array#find_index ' do
    n.times do
       index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
       x = data[index]
    end
  end
end

结果是:

Enumerable#select  1.000000   0.010000   1.010000 (  1.002282)
Enumerable#detect  0.790000   0.000000   0.790000 (  0.797319)
Enumerable#each    0.620000   0.000000   0.620000 (  0.627272)
Array#find_index   0.520000   0.000000   0.520000 (  0.515691)

答案 2 :(得分:1)

v3最快:

def v1
  @data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end

def v2
  @data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end

def v3
  @data.each do |t|
    if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
      return t
      break
    end
  end
end

select总是最慢的,因为它必须遍历整个数组。我不确定为什么find比v3慢。这可能与开销有关。

但是,find和v3可能与您的数据相同。以下结果不一定对您的数据有效。

t = Time.now; 10000.times{ v1 }; Time.now - t
=> 0.014131

t = Time.now; 10000.times{ v2 }; Time.now - t
=> 0.013138

t = Time.now; 10000.times{ v3 }; Time.now - t
=> 0.008799

在示例数据上运行此操作与在实际数据上运行它不同。

如果真实数据太大,您可以在数据的子集上运行它以获得更好的答案。

顺便说一句,您可以将v3重写为:

data.each do |t|
  break t if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
end

FWIW,在阵列上操作将非常笨拙和缓慢。您可能希望将其保存在数据库中并运行查询。对于大型数据集,这可能至少快2个数量级。

答案 3 :(得分:1)

所有这些变体都是O(n)复杂性。 如果范围不重叠,则可以使用bsearch的数组,即O(log n)复杂度。您应该首先对范围进行排序。

sorted = data.sort_by { |x| x[:start_date] }
sorted.bsearch { |x| ..check if range of `x` includes value.. }