Array#delete_at或Array#slice!?以及如何查找实现

时间:2014-06-19 12:33:21

标签: ruby arrays

我正在清理大型数据文件(+ 1MM逗号分隔的行)。示例行可能如下所示:

@row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"

必须从中删除某些列,之后该行应如下所示:

@row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo,05-MAR-14 05:50:24,SourceID,TransactionalID"

问题1:如果我将一行数据转换为Array,首选哪种方法可以删除元素:Array#delete_atArray#slice!?我想知道哪个是更惯用的选择。性能是一个考虑因素,我在Windows机器上。

def remove_bad_columns
  ary = @row.split(",")
  ary.delete_at(10)
  ary.delete_at(5)
  @row = ary.join(",")
end

问题2:我想知道其中一种方法是否是使用另一种方法实现的。我怎样才能看到这些方法是如何在ruby中构建的? (例如,如何使用for实现each。)

2 个答案:

答案 0 :(得分:1)

我建议你使用Array#values_at而不是delete_atslice!

def remove_vals(str, *indices)
  ary = str.split(",")
  v = (0...ary.size).to_a - indices
  ary.values_at(*v).join(",")
end

@row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry," +
      "R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"

@row = remove_vals(@row, 5, 10)
  #=> "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo," +
  #   "05-MAR-14 05:50:24,SourceID,TransactionalID"

Array#values_at优于其他两种方法,您不必担心删除元素的顺序。

这种方法的效率与其他两种方法没有显着差异。如果@spickermann想将它添加到他的基准测试中,他可以使用它:

def values_at
  ary = array.split(",")
  v = (0...ary.size).to_a - [5,10]
  @row = ary.values_at(*v).join(",")
end

答案 1 :(得分:0)

性能没有太大差异。我更喜欢delete_at,因为它更好看。

require 'benchmark'

def array
  "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
end 

def delete_at
  ary = array.dup.split(",")
  ary.delete_at(10)
  ary.delete_at(5)
  @row = ary.join(",")
end

def slice!
  ary = array.dup.split(",")
  ary.slice!(10)
  ary.slice!(5)
  @row = ary.join(",")
end

require 'benchmark'

n = 1_000_000
Benchmark.bmbm(15) do |x|
  x.report("delete_at :")   { n.times do; delete_at; end }
  x.report("slice!    :")   { n.times do; slice!   ; end }
end

# Rehearsal ---------------------------------------------------
# delete_at :       4.560000   0.000000   4.560000 (  4.566496)
# slice!    :       4.580000   0.010000   4.590000 (  4.576767)
# ------------------------------------------ total: 9.150000sec
# 
#                       user     system      total        real
# delete_at :       4.500000   0.000000   4.500000 (  4.505638)
# slice!    :       4.600000   0.000000   4.600000 (  4.613447)