我可以使用.uniq
轻松删除数组中的重复项,但如何在不使用.uniq
方法的情况下执行此操作?
答案 0 :(得分:5)
a = [1, 1, 1, 2, 4, 3, 4, 3, 2, 5, 5, 6]
class Array
def my_uniq
self | []
end
end
a.my_uniq
#=> [1, 2, 4, 3, 5, 6]
这使用方法Array#|:“Set Union - 通过将ary与other_ary连接来返回一个新数组,不包括任何重复项并保留原始数组中的顺序。”
以下是各种答案的基准,以及Array#uniq
。
require 'fruity'
require 'set'
def doit(n, m)
arr = n.times.to_a
arr = m.times.map { arr.sample }
compare do
uniq { arr.uniq }
Schwern { uniq = []; arr.sort.each { |e| uniq.push(e) if e != uniq[-1]; uniq } }
Sharma {b = []; arr.each{ |aa| b << aa unless b.include?(aa) }; b }
Mihael { arr.to_set.to_a }
sawa { arr.group_by(&:itself).keys }
Cary { arr | [] }
end
end
doit(1_000, 500)
# Schwern is faster than uniq by 19.999999999999996% ± 10.0% (results differ)
# uniq is similar to Cary
# Cary is faster than Mihael by 10.000000000000009% ± 10.0%
# Mihael is similar to sawa
# sawa is faster than Sharma by 5x ± 0.1
doit(100_000, 50_000)
# Schwern is faster than uniq by 50.0% ± 10.0% (results differ)
# uniq is similar to Cary
# Cary is similar to Mihael
# Mihael is faster than sawa by 10.000000000000009% ± 10.0%
# sawa is faster than Sharma by 310x ± 10.0
“Schwern”和“uniq”返回包含相同元素但不以相同顺序排列的数组(因此“结果不同”)。
这是@Schern要求的额外基准。
def doit1(n)
arr = n.times.map { rand(n/10) }
compare do
uniq { arr.uniq }
Schwern { uniq = []; arr.sort.each { |e| uniq.push(e) if e != uniq[-1]; uniq } }
Sharma {b = []; arr.each{ |aa| b << aa unless b.include?(aa) }; b }
Mihael { arr.to_set.to_a }
sawa { arr.group_by(&:itself).keys }
Cary { arr | [] }
end
end
doit1(1_000)
# Cary is similar to uniq
# uniq is faster than sawa by 3x ± 1.0
# sawa is similar to Schwern (results differ)
# Schwern is similar to Mihael (results differ)
# Mihael is faster than Sharma by 2x ± 0.1
doit1(50_000)
# Cary is similar to uniq
# uniq is faster than Schwern by 2x ± 1.0 (results differ)
# Schwern is similar to Mihael (results differ)
# Mihael is similar to sawa
# sawa is faster than Sharma by 62x ± 10.0
答案 1 :(得分:4)
大多数Ruby方法的代码可以在ruby-doc.org API documentation中找到。如果您将鼠标悬停在方法的文档上,请单击以切换源&#34;按钮出现。代码在C中,但它很容易理解。
if (RARRAY_LEN(ary) <= 1)
return rb_ary_dup(ary);
if (rb_block_given_p()) {
hash = ary_make_hash_by(ary);
uniq = rb_hash_values(hash);
}
else {
hash = ary_make_hash(ary);
uniq = rb_hash_values(hash);
}
如果有一个元素,请将其返回。否则将元素转换为哈希键,将哈希值转换回数组。 By a documented quirk of Ruby hashes, "Hashes enumerate their values in the order that the corresponding keys were inserted",此技术保留了Array中元素的原始顺序。在其他语言中,它可能不会。
或者,使用Set。集合永远不会有重复。加载set
会将方法to_set
添加到所有Enumerable对象,其中包括数组。但是,Set通常被实现为Hash,因此您可以执行相同的操作。如果你想要一个独特的数组,如果你不需要订购元素,你应该改为创建一个集合并使用它。 unique = array.to_set
或者,对Array进行排序并循环遍历它,将每个元素推送到新的Array上。如果新数组的最后一个元素与当前元素匹配,则丢弃它。
array = [2, 3, 4, 5, 1, 2, 4, 5];
uniq = []
# This copies the whole array and the duplicates, wasting
# memory. And sort is O(nlogn).
array.sort.each { |e|
uniq.push(e) if e != uniq[-1]
}
[1, 2, 3, 4, 5]
puts uniq.inspect
应该避免使用此方法,因为它比其他方法更慢并且占用更多内存。排序使它变慢。排序为O(nlogn)意味着随着数组变大,排序将比数组增长更慢。它还要求您使用重复项复制整个数组,除非您想通过使用sort!
进行排序来更改原始数据。
其他方法是O(n)速度和O(n)存储器意味着它们将随着阵列变大而线性扩展。并且他们不必复制可以使用更少内存的副本。
答案 2 :(得分:3)
您可以使用#to_set
了解更多相关信息here
答案 3 :(得分:2)
<activity>
......................
答案 4 :(得分:1)
您也可以尝试这一点,请查看以下示例。
a = [1, 1, 1, 2, 4, 3, 4, 3, 2, 5, 5, 6]
b = []
a.each{ |aa| b << aa unless b.include?(aa) }
# when you check b you will get following result.
[1, 2, 4, 3, 5, 6]
或者你也可以试试
a = [1, 1, 1, 2, 4, 3, 4, 3, 2, 5, 5, 6]
b = a & a
# OR
b = a | a
# both will return following result
[1, 2, 4, 3, 5, 6]