Question

我在想。在Ruby中测试数组是否包含另一个数组的最快方法是什么？所以我构建了这个小基准脚本。很想听听你对比较方法的看法。你知道其他一些 - 或许更好的方法吗？

require 'benchmark'
require 'set'

a = ('a'..'z').to_a.shuffle
b = ["b","d","f"]

Benchmark.bm do |x|
  x.report do
      10000.times do
          Set[b].subset?(a.to_set)
      end
  end
  x.report do
      10000.times do
          (a & b).count == b.size
      end
  end
    x.report do
      10000.times do
          (a.inject(0) {|s,i| s += b.include?(i)?1:0 } == b.size)
      end
    end
    x.report do
      10000.times do
          (b - a).empty?
      end
    end
    x.report do
      10000.times do
          b.all? { |o| a.include? o }
      end
    end
end

结果：

     user     system      total        real
 0.380000   0.010000   0.390000 (  0.404371)
 0.050000   0.010000   0.060000 (  0.075062)
 0.140000   0.000000   0.140000 (  0.140420)
 0.130000   0.000000   0.130000 (  0.136385)
 0.030000   0.000000   0.030000 (  0.034405)

Answer 1

首先，要非常小心微基准测试。我建议使用我的gem fruity，请参阅文档以了解原因。

其次，您想比较数组的创建加上比较，还是比较？

第三，您的数据太小，您将无法理解发生了什么。例如，您的b变量包含3个元素。如果您将O(n^2)中的算法与O(n)中的算法进行比较，使用如此小的n（3），则不会显而易见。

您可能希望从以下开始：

require 'fruity'
require 'set'

a = ('a'..'z').to_a.shuffle
b = %w[b d f]
a_set = a.to_set
b_set = b.to_set

compare do
  subset        { b_set.subset?(a_set) }
  intersect     { (a & b).size == b.size }
  subtract      { (b - a).empty? }
  array_include { b.all?{|o| a.include? o} }
  set_include   { b.all?{|o| a_set.include? o} }
end

给出：

Running each test 2048 times. Test will take about 2 seconds.
set_include is faster than subset by 1.9x ± 0.1
subset is faster than intersect by 60% ± 10.0%
intersect is faster than array_include by 40% ± 1.0%
array_include is faster than subtract by 1.9x ± 0.1

请注意，Array#&和Array#-基本上会在内部将参数转换为Set。数组上的all?和include?应该是最差的解决方案，因为它会O(n^2) ...如果你增加b的大小，这就很明显了。< / p>

一般的答案是：除非您确定需要优化，否则请使用最清晰的。

Answer 2

这取决于您的数据大小。对于您拥有的小数据集; b.all? { |o| a.include? o }几乎每次都更快。

但是，如果您尝试使用更大的数组。例如。 1000个元素的数组(a & b) == b.size要快得多。

我也尝试了相反的版本：(a | b) == a.size，它或多或少相同。

以下是（注释）结果，a有10000个元素，b有5000个元素：

    user     system      total        real
0.010000   0.000000   0.010000 (  0.004445) # subset
0.000000   0.000000   0.000000 (  0.003073) # & (intersection)
1.620000   0.000000   1.620000 (  1.625472) # inject
0.000000   0.000000   0.000000 (  0.004485) # difference
0.530000   0.000000   0.530000 (  0.529042) # include
0.010000   0.000000   0.010000 (  0.004416) # | (union)

测试Ruby中数组包含的最快方法

2 个答案: