我正在处理一些大型数据集,并尝试提高性能。我需要确定对象是否包含在数组中。我正在考虑使用index
或include?
,因此我对两者进行了基准测试。
require 'benchmark'
a = (1..1_000_000).to_a
num = 100_000
reps = 100
Benchmark.bmbm do |bm|
bm.report('include?') do
reps.times { a.include? num }
end
bm.report('index') do
reps.times { a.index num }
end
end
令人惊讶的是(对我而言),index
要快得多。
user system total real
include? 0.330000 0.000000 0.330000 ( 0.334328)
index 0.040000 0.000000 0.040000 ( 0.039812)
由于index
提供的信息比include?
更多,我原本预计它会稍微慢一些,尽管事实并非如此。为什么它更快?
(我知道index
直接来自数组类,而include?
是从Enumerable继承的。可能会解释它吗?)
答案 0 :(得分:4)
查看Ruby MRI源代码时,似乎index
使用优化的rb_equal_opt
而include?
使用rb_equal
。这可以在rb_ary_includes和rb_ary_index中看到。 Here是进行更改的提交。我不清楚为什么在index
而不是include?
您可能还会发现阅读此feature
的讨论很有意思答案 1 :(得分:1)
如果性能是您的目标,您应该使用 Array#bsearch,它使用二进制搜索遍历数组。
https://ruby-doc.org/core-2.7.0/Array.html#method-i-bsearch
a.bsearch {|a| num <=> a }
它同时抽index
和include
Rehearsal --------------------------------------------
include? 0.108172 0.000805 0.108977 ( 0.112928)
index 0.122730 0.000502 0.123232 ( 0.126323)
bsearch 0.000254 0.000027 0.000281 ( 0.000354)
----------------------------------- total: 0.232490sec
user system total real
include? 0.106727 0.000036 0.106763 ( 0.108495)
index 0.107732 0.000330 0.108062 ( 0.110272)
bsearch 0.000201 0.000008 0.000209 ( 0.000206)
答案 2 :(得分:0)
我进行了相同的基准测试。好像包括?比索引快,尽管不是很一致。 这是我针对两种不同情况的结果。
user system total real
index 0.065803 0.000652 0.066455 ( 0.067181)
include? 0.065551 0.000590 0.066141 ( 0.066894)
user system total real
index 0.000034 0.000005 0.000039 ( 0.000037)
include? 0.000017 0.000001 0.000018 ( 0.000017)
代码:
require 'benchmark'
# parse ranks and return number of reports to using index
def solution_using_index(ranks)
return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1
return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ? 1 : 0) if ranks.length == 2
return 0 if ranks.max > 1000000000 || ranks.min < 0
grouped_ranks = ranks.group_by(&:itself)
report_to, rank_keys= 0, grouped_ranks.keys
rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.index(rank+1) }
report_to
end
# parse ranks and return number of reports to using include
def solution_using_include(ranks)
return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1
return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ? 1 : 0) if ranks.length == 2
return 0 if ranks.max > 1000000000 || ranks.min < 0
grouped_ranks = ranks.group_by(&:itself)
report_to, rank_keys= 0, grouped_ranks.keys
rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.include?(rank+1) }
report_to
end
test_data = [[3, 4, 3, 0, 2, 2, 3, 0, 0], [4, 4, 3, 3, 1, 0], [4, 2, 0] ]
Benchmark.bmbm do |bm|
bm.report('index') do
test_data.each do |ranks|
reports_to = solution_using_index(ranks)
end
end
bm.report('include?') do
test_data.each do |ranks|
reports_to = solution_using_include(ranks)
end
end
end