目前我面临这个问题 例如,我有这个散列数组
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
所以我想在上面哈希的start_date和end_date的范围内找到具有“2015-01-04”的确切哈希
按照我发现的文件,有3种方法可以做到这一点
1)使用选择
finding_hash = data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
finding_hash将返回所需哈希的数组
但是当我这样做时,我保证在执行此SELECT之后,条件总会只有一个哈希匹配
我必须finding_hash.first
才能获得我想要的哈希值
2)使用find
finding_hash = data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
这种方式,finds_hash是我需要的结果哈希
3)传统循环
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
return t
break
end
end
那么哪一个是最快的方法。我确实需要性能,因为我的数据非常大!
谢谢你,抱歉我的英文不好!
答案 0 :(得分:2)
您可以按benchmark
进行测试例如:
require 'benchmark'
n = 1000000
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
Benchmark.bm do |x|
x.report { n.times do
data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
}
x.report { n.times do
data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
}
x.report {
n.times do
finding_hash = {}
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
finding_hash = t
break
end
end
end
}
end
输出:
user system total real
1.490000 0.020000 1.510000 ( 1.533589)
1.070000 0.010000 1.080000 ( 1.096578)
1.000000 0.010000 1.010000 ( 1.011021)
测试结果与n的值和数据大小有关。
答案 1 :(得分:2)
您尝试过的所有方法都是Enumerable
方法,但原生Array
方法更快。试试find_index
。即使在必须单独调用加载哈希之后,它仍然比下一个最快的速度快20%左右:
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
我的基准:
n = 1_000_000
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
Benchmark.bm do |x|
x.report 'Enumerable#select' do
n.times do
data.select do |h|
h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
end
end
end
x.report 'Enumerable#detect' do
n.times do
data.detect do |h|
h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
end
end
end
x.report 'Enumerable#each ' do
n.times do
finding_hash = {}
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
finding_hash = t
break t
end
end
end
end
x.report 'Array#find_index ' do
n.times do
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
end
end
end
结果是:
Enumerable#select 1.000000 0.010000 1.010000 ( 1.002282)
Enumerable#detect 0.790000 0.000000 0.790000 ( 0.797319)
Enumerable#each 0.620000 0.000000 0.620000 ( 0.627272)
Array#find_index 0.520000 0.000000 0.520000 ( 0.515691)
答案 2 :(得分:1)
v3最快:
def v1
@data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
def v2
@data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
def v3
@data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
return t
break
end
end
end
select
总是最慢的,因为它必须遍历整个数组。我不确定为什么find
比v3慢。这可能与开销有关。
但是,find
和v3可能与您的数据相同。以下结果不一定对您的数据有效。
t = Time.now; 10000.times{ v1 }; Time.now - t
=> 0.014131
t = Time.now; 10000.times{ v2 }; Time.now - t
=> 0.013138
t = Time.now; 10000.times{ v3 }; Time.now - t
=> 0.008799
在示例数据上运行此操作与在实际数据上运行它不同。
如果真实数据太大,您可以在数据的子集上运行它以获得更好的答案。
顺便说一句,您可以将v3重写为:
data.each do |t|
break t if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
end
FWIW,在阵列上操作将非常笨拙和缓慢。您可能希望将其保存在数据库中并运行查询。对于大型数据集,这可能至少快2个数量级。
答案 3 :(得分:1)
所有这些变体都是O(n)复杂性。
如果范围不重叠,则可以使用bsearch
的数组,即O(log n)复杂度。您应该首先对范围进行排序。
sorted = data.sort_by { |x| x[:start_date] }
sorted.bsearch { |x| ..check if range of `x` includes value.. }