我正在清理大型数据文件(+ 1MM逗号分隔的行)。示例行可能如下所示:
@row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
必须从中删除某些列,之后该行应如下所示:
@row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo,05-MAR-14 05:50:24,SourceID,TransactionalID"
问题1:如果我将一行数据转换为Array
,首选哪种方法可以删除元素:Array#delete_at或Array#slice!?我想知道哪个是更惯用的选择。性能是一个考虑因素,我在Windows机器上。
def remove_bad_columns
ary = @row.split(",")
ary.delete_at(10)
ary.delete_at(5)
@row = ary.join(",")
end
问题2:我想知道其中一种方法是否是使用另一种方法实现的。我怎样才能看到这些方法是如何在ruby中构建的? (例如,如何使用for
实现each
。)
答案 0 :(得分:1)
我建议你使用Array#values_at而不是delete_at
或slice!
:
def remove_vals(str, *indices)
ary = str.split(",")
v = (0...ary.size).to_a - indices
ary.values_at(*v).join(",")
end
@row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry," +
"R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
@row = remove_vals(@row, 5, 10)
#=> "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo," +
# "05-MAR-14 05:50:24,SourceID,TransactionalID"
Array#values_at
优于其他两种方法,您不必担心删除元素的顺序。
这种方法的效率与其他两种方法没有显着差异。如果@spickermann想将它添加到他的基准测试中,他可以使用它:
def values_at
ary = array.split(",")
v = (0...ary.size).to_a - [5,10]
@row = ary.values_at(*v).join(",")
end
答案 1 :(得分:0)
性能没有太大差异。我更喜欢delete_at
,因为它更好看。
require 'benchmark'
def array
"123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
end
def delete_at
ary = array.dup.split(",")
ary.delete_at(10)
ary.delete_at(5)
@row = ary.join(",")
end
def slice!
ary = array.dup.split(",")
ary.slice!(10)
ary.slice!(5)
@row = ary.join(",")
end
require 'benchmark'
n = 1_000_000
Benchmark.bmbm(15) do |x|
x.report("delete_at :") { n.times do; delete_at; end }
x.report("slice! :") { n.times do; slice! ; end }
end
# Rehearsal ---------------------------------------------------
# delete_at : 4.560000 0.000000 4.560000 ( 4.566496)
# slice! : 4.580000 0.010000 4.590000 ( 4.576767)
# ------------------------------------------ total: 9.150000sec
#
# user system total real
# delete_at : 4.500000 0.000000 4.500000 ( 4.505638)
# slice! : 4.600000 0.000000 4.600000 ( 4.613447)