我有两个文件: (1)第一个文件包含系统的用户,我将该文件读入一个数组 (2)第二个文件包含有关这些用户的统计信息
我的任务是让用户计数,例如
{"user1" => 1, "user2" => 0, "user3" => 4}
这是我如何解决问题的
# Result wanted
# Given the names and stats array generate the results array
# result = {'user1' => 3, 'user2' => 1, 'user3' => 0, 'user4' => 1}
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map {|v| [v, 0]}] # to make sure every name gets a value
stats.each do |item| # basic loop to count the records
hash[item] += 1 if hash.has_key?(item)
end
puts hash
# terminal outcome
# $ ruby example.rb
# {"user1"=>3, "user2"=>2, "user3"=>0, "user4"=>1}
我只是好奇,如果有更好的方式而不是循环计数,特别是因为Ruby带有神奇的力量而我来自C背景
答案 0 :(得分:2)
基本上,除了一些小问题之外,您的代码是最快的,您可以为此运行代码。
如果您有一个标记数组末尾的不需要的条目
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
我认为你应该在运行之前pop
关闭它,因为它有可能导致一个奇怪的条目,它的存在迫使你在循环中使用条件测试,从而减慢你的代码。
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]
存在内置方法,可以减少单个调用的代码量,但它们比循环慢:
stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}
从那里可以很容易地将map
结果哈希变成摘要:
stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]
然后再次进入哈希:
stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}
或:
Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}
使用内置方法是有效的,并且在处理非常大的列表时非常有用,因为进行的冗余循环非常少。
如果正确编写,像你一样循环数据也非常快,并且通常比内置方法更快。以下是一些基准测试,显示了完成这些工作的其他方法:
require 'fruity' # => true
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']
Hash[names.map {|v| [v, 0]}] # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )] # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
compare do
map_hash { Hash[names.map {|v| [v, 0]}] }
zip_hash { Hash[names.zip([0] * names.size )] }
to_h_hash { names.zip([0] * names.size ).to_h }
hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end
# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%
查看循环中的条件以了解它如何影响代码:
require 'fruity' # => true
NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]
def build_hash
h = {}
NAMES.each{ |k| h[k] = 0 }
h
end
compare do
your_way {
hash = build_hash()
STATS.each do |item| # basic loop to count the records
hash[item] += 1 if hash.has_key?(item)
end
hash
}
my_way {
hash = build_hash()
STATS2.each { |e| hash[e] += 1 }
hash
}
end
# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%
虽然有几个答案建议使用count
,但随着您的列表数量增加,代码将会减慢很多,在stats
数组中行走一次,就像您一样,将始终是线性的,所以坚持使用其中一种迭代解决方案。
答案 1 :(得分:1)
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.each_with_object(Hash.new(0)) { |user,hash|
hash[user] += 1 if names.include?(user) }
#=> {"user1"=>3, "user2"=>2, "user4"=>1}
答案 2 :(得分:1)
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash.new
names.each { |name| hash[name] = stats.count(name) }
puts hash
答案 3 :(得分:1)
您可以使用map和Hash []。
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map { |name| [name, stats.count(name)] }]
puts hash