如何计算Ruby数组中的重复元素

时间:2009-02-20 14:17:21

标签: ruby arrays

我有一个排序数组:

[
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
]

我想得到类似的东西,但它不一定是哈希:

[
  {:error => 'FATAL <error title="Request timed out.">', :count => 2},
  {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
]

13 个答案:

答案 0 :(得分:122)

以下代码打印您要求的内容。我会让你决定如何实际用来生成你正在寻找的哈希:

# sample array
a=["aa","bb","cc","bb","bb","cc"]

# make the hash default to 0 so that += will work correctly
b = Hash.new(0)

# iterate over the array, counting duplicate entries
a.each do |v|
  b[v] += 1
end

b.each do |k, v|
  puts "#{k} appears #{v} times"
end

注意:我刚注意到你说数组已经排序了。上面的代码不需要排序。使用该属性可能会产生更快的代码。

答案 1 :(得分:67)

您可以使用inject

非常简洁地(一行)执行此操作
a = ['FATAL <error title="Request timed out.">',
      'FATAL <error title="Request timed out.">',
      'FATAL <error title="There is insufficient ...">']

b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }

b.to_a.each {|error,count| puts "#{count}: #{error}" }

将产生:

1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">

答案 2 :(得分:29)

如果您有这样的数组:

words = ["aa","bb","cc","bb","bb","cc"]

您需要计算重复元素,单行解决方案是:

result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

答案 3 :(得分:15)

使用Enumerable#group_by对上述答案采用不同的方法。

[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}

将其分解为不同的方法调用:

a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}
Ruby 1.8.7中添加了

Enumerable#group_by

答案 4 :(得分:12)

以下内容如何:

things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h

对我们实际尝试做的事情感觉更干净,更具描述性。

我怀疑对于大型集合而言,它也会比迭代每个值的集合表现更好。

基准性能测试:

a = (1...1000000).map { rand(100)}
                       user     system      total        real
inject                 7.670000   0.010000   7.680000 (  7.985289)
array count            0.040000   0.000000   0.040000 (  0.036650)
each_with_object       0.210000   0.000000   0.210000 (  0.214731)
group_by               0.220000   0.000000   0.220000 (  0.218581)

所以速度要快得多。

答案 5 :(得分:8)

就我个人而言,我会这样做:

# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a

然后运行程序并将其传递给uniq -c:

ruby myprogram.rb | uniq -c

输出:

 2 FATAL <error title="Request timed out.">
 1 FATAL <error title="There is insufficient system memory to run this query.">

答案 6 :(得分:5)

在Ruby> = 2.2中,您可以使用itselfarray.group_by(&:itself).transform_values(&:count)

更多细节:

array = [
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="Request timed out.">',
  'FATAL <error title="There is insufficient system memory to run this query.">'
];

array.group_by(&:itself).transform_values(&:count)
 => { "FATAL <error title=\"Request timed out.\">"=>2,
      "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }

答案 7 :(得分:3)

a = [1,1,1,2,2,3]
a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } }
=> [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

答案 8 :(得分:2)

Ruby版本> = 2.7将具有Enumerable#tally

例如:

["a", "b", "c", "b"].tally 
# => {"a"=>1, "b"=>2, "c"=>1}

答案 9 :(得分:1)

如果您想经常使用它,我建议您这样做:

# lib/core_extensions/array/duplicates_counter
module CoreExtensions
  module Array
    module DuplicatesCounter
      def count_duplicates
        self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h
      end
    end
  end
end

加载

Array.include CoreExtensions::Array::DuplicatesCounter

然后仅需使用即可:

the_ar = %w(a a a a a a a  chao chao chao hola hola mundo hola chao cachacho hola)
the_ar.duplicates_counter
{
           "a" => 7,
        "chao" => 4,
        "hola" => 4,
       "mundo" => 1,
    "cachacho" => 1
}

答案 10 :(得分:0)

简单实施:

(errors_hash = {}).default = 0
array_of_errors.each { |error| errors_hash[error] += 1 }

答案 11 :(得分:0)

以下是示例数组:

a=["aa","bb","cc","bb","bb","cc"]
  1. 选择所有唯一键。
  2. 对于每个密钥,我们会将它们累积到哈希中以获得类似这样的内容:{'bb' => ['bb', 'bb']}
  3.     res = a.uniq.inject({}) {|accu, uni| accu.merge({ uni => a.select{|i| i == uni } })}
        {"aa"=>["aa"], "bb"=>["bb", "bb", "bb"], "cc"=>["cc", "cc"]}
    

    现在你可以做以下事情:

    res['aa'].size 
    

答案 12 :(得分:-3)

def find_most_occurred_item(arr)
    return 'Array has unique elements already' if arr.uniq == arr
    m = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    m.each do |k, v|
        a = arr.max_by { |v| m[v] }
        if v > a
            puts "#{k} appears #{v} times"
        elsif v == a
            puts "#{k} appears #{v} times"
        end 
    end
end

puts find_most_occurred_item([1, 2, 3,4,4,4,3,3])