尝试读取包含信息列表的文件,它是一个.dtf文件。信息在每点1个段落中。示例:
ID : 001
category : 2
length : 18.33
ID : 002
category : 1
length : 19.75
ID : 003
category : 1
length : 18.8
ID : 004
category : 3
length : 17.9
ID : 005
category : 3
length : 16.9
ID : 006
category : 2
length : 17.9
ID : 007
category : 3
length : 21.5
ID : 008
category : 1
length : 20.7
ID : 009
category : 1
length : 16.5
ID : 010
category : 1
length : 23
ID : 011
category : 2
length : 18.73
ID : 012
category : 3
length : 17.9
ID : 013
category : 3
length : 23.4
ID : 014
category : 3
length : 17.9
ID : 015
category : 3
length : 20.93
以此类推。
需要对类别进行分组,并且每组的总长度。 有人可以帮忙吗?
成功地将类别分组,但未能获得总长度。
a = IO.readlines("point.txt")
b = Hash.new(0)
a.each do |v|
b[v] +=1
end
b.each do |k, v|
puts "#{k} occurs #{v}"
end
b = Hash.new(0)
预期输出:
Category 1 : 5 points
Total length : 98.75
Category 2 : 3 points
Total length : 54.96
Category 3 : 7 points
Total length : 136.43
答案 0 :(得分:2)
假设您使用IO::read将文件读入字符串str
(str = File.read('point.txt')
)中并获得以下内容。
str = <<-END.gsub(/(?<=\n) +(?=\n)/, '')
ID : 001
category : 2
length : 6.30
ID : 002
category : 1
length : 17.9
ID : 003
category : 2
length : 11.2
END
#=> "ID : 001\ncategory : 2\nlength : 6.30\n\n\nID : 002\ncategory : 1\nlength : 17.9\n\nID : 003\ncategory : 2\nlength : 11.2\n"
.gsub(/(?<=\n) +(?=\n)/, '')
只是为了防止删除分隔连续换行符的空格,以确保下一步正常工作。
a = str.split(/\n{2,}/)
#=> ["ID : 001\ncategory : 2\nlength : 6.30",
# "ID : 002\ncategory : 1\nlength : 17.9",
# "ID : 003\ncategory : 2\nlength : 11.2\n"]
将字符串分组,然后
b = a.map(&:lines)
#=> [["ID : 001\n", "category : 2\n", "length : 6.30"],
# ["ID : 002\n", "category : 1\n", "length : 17.9"],
# ["ID : 003\n", "category : 2\n", "length : 11.2\n"]]
将每个组分成几行。现在,将每个组转换为哈希。
c = b.map do |d|
d.each_with_object({}) do |s,h|
key, val = s.strip.split(/ *: */)
h[key] = val.include?('.') ? val.to_f : val.to_i
end
end
#=> [{"ID"=>1, "category"=>2, "length"=>6.3},
# {"ID"=>2, "category"=>1, "length"=>17.9},
# {"ID"=>3, "category"=>2, "length"=>11.2}]
我们现在可以按类别计算组的数量并以两种方式汇总“长度”。这是一种使用Hash#update(又称merge!
)形式的代码,它使用一个块(下面带有块变量k
,o
和n
)来确定合并在两个哈希中的键的值。
d = c.each_with_object({}) do |g,h|
h.update(g["category"]=>{ "nbr"=>1, "length"=>g["length"] }) do |k,o,n|
{ "nbr"=>o["nbr"]+n["nbr"], "length"=>o["length"]+n["length"] }
end
end
#=> {2=>{"nbr"=>2, "length"=>17.5},
# 1=>{"nbr"=>1, "length"=>17.9}}
我假设此哈希d
提供了您将需要的所有信息。如果没有,您可以相应地修改计算。
从哈希c
中提取所需信息的另一种常见方式是使用Enumerable#group_by:
c.group_by { |h| h["category"] }.transform_values do |arr|
{ "nbr"=>arr.size, "length"=>arr.sum { |h| h["length"] } }
end
#=> {2=>{"nbr"=>2, "length"=>17.5},
# 1=>{"nbr"=>1, "length"=>17.9}}
注意:
c.group_by { |h| h["category"] }
#=> {2=>[{"ID"=>1, "category"=>2, "length"=>6.3},
# {"ID"=>3, "category"=>2, "length"=>11.2}],
# 1=>[{"ID"=>2, "category"=>1, "length"=>17.9}]}
答案 1 :(得分:1)
您的问题是您实际上只是在计算唯一行而没有实际处理。您需要逐行解析文件,从每一行中提取键值对,并以某种方式将这些点与类别相关联-仅在您的计算有意义之后。
在最简单的情况下,如果数据结构足够健壮并且长度始终遵循类别,则解析可能与
一样琐碎。text = StringIO.new(<<~DATA)
ID : 001
category : 2
length : 6.30
ID : 002
category : 1
length : 17.9
ID : 003
category : 2
length : 3.70
DATA
categories = Hash.new { |h,k| h[k] = {count: 0, length: 0} }
current_cat = nil
text.each_line do |line|
next if line.strip.empty?
key, value = line.split(":").map(&:strip)
case key
when "category"
current_cat = value
categories[current_cat][:count] += 1
when "length"
categories[current_cat][:length] += Float(value)
end
end
puts categories.inspect # => {"2"=>{:count=>2, :length=>10.0}, "1"=>{:count=>1, :length=>17.9}}
(只需将stringio替换为从文件中读取即可将其映射到您的用例)
答案 2 :(得分:1)
如果每个“点条目”均以ID
开头,则可以使用slice_before
相应地拆分数据,例如:
IO.foreach('point.txt').slice_before(/^ID/).each do |lines|
# ...
end
然后可以将结果映射到更易于管理的对象,例如哈希:
points = IO.foreach('point.txt').slice_before(/^ID/).map do |lines|
lines.each_with_object({}) do |line, h|
case line
when /^ID : (.*)/
h[:id] = $1
when /^category : (.*)/
h[:category] = $1.to_i
when /^length : (.*)/
h[:length] = $1.to_f
end
end
end
#=> [
# {:id=>"001", :category=>2, :length=>18.33},
# {:id=>"002", :category=>1, :length=>19.75},
# # ...
# ]
我们现在可以按类别对点进行分组:
grouped_points = points.group_by { |h| h[:category] }
并打印结果:
grouped_points.each do |category, points|
puts "Category #{category} : #{points.length} points"
puts "Total length : #{ points.sum { |p| p[:length] }.round(2) }"
puts
end
输出:
Category 2 : 3 points
Total length : 54.96
Category 1 : 5 points
Total length : 98.75
Category 3 : 7 points
Total length : 136.43
您可能想对grouped_points
进行排序。
答案 3 :(得分:0)
与其他答案所示的汤差不多。
阅读文件后,a
包含:
#=> ["ID : 001\n", "category : 2\n", "length : 18.33\n", "\n", "ID : 002\n", "category : 1\n", "length : 19.75\n", "\n", "ID : 003\n", "category : 1\n", "length : 18.8\n", "\n", "ID : 004\n", "category : 3\n", "length : 17.9\n", "\n", "ID : 005\n", "category : 3\n", "length : 16.9\n", "\n", "ID : 006\n", "category : 2\n", "length : 17.9\n", "\n", "ID : 007\n", "category : 3\n", "length : 21.5\n", "\n", "ID : 008\n", "category : 1\n", "length : 20.7\n", "\n", "ID : 009\n", "category : 1\n", "length : 16.5\n", "\n", "ID : 010\n", "category : 1\n", "length : 23\n", "\n", "ID : 011\n", "category : 2\n", "length : 18.73\n", "\n", "ID : 012\n", "category : 3\n", "length : 17.9\n", "\n", "ID : 013\n", "category : 3\n", "length : 23.4\n", "\n", "ID : 014\n", "category : 3\n", "length : 17.9\n", "\n", "ID : 015\n", "category : 3\n", "length : 20.93"]
然后您需要将此混乱转换为更舒适的对象,哈希数组是最佳选择,因此:
res = a.map{ |e| e.chomp.gsub(/\s+/, "").split(':') }.reject(&:empty?).each_slice(3).map(&:to_h)
#=> [{"ID"=>"001", "category"=>"2", "length"=>"18.33"}, {"ID"=>"002", "category"=>"1", "length"=>"19.75"}, {"ID"=>"003", "category"=>"1", "length"=>"18.8"}, ...
也许最好将length
值作为浮点数:
res.map { |h| h['length'] = h['length'].to_f }
最后,按"category"
分组并转换生成的哈希值:
res.group_by { |h| h['category']}.transform_values { |v| [v.size, v.sum { |h| h['length'] }] }
#=> {"2"=>[3, 54.959999999999994], "1"=>[5, 98.75], "3"=>[7, 136.43]}
a.map{ |e| e.chomp.gsub(/\s+/, "").split(':') }.reject(&:empty?).each_slice(3).map(&:to_h).tap { |res| res.map { |h| h['length'] = h['length'].to_f } }.group_by { |h| h['category']}.transform_values { |v| [v.size, v.sum { |h| h['length'] }] }