解析并计算结构化文本以构建统计信息

时间:2012-07-07 12:29:05

标签: ruby parsing

我有一个特定的问题,我找不到解决方案。 我从文本文件中获取以下格式的数据

    date1 reason1
    date1 reason1
    date1 reason2
    date1 reason3
    date2 reason4
    date2 reason1
    date2 reason2
    date2 reason2
    date2 reason1
    date2 reason3
    date3 reason4
    date3 reason4
    date3 reason1

我想建立有关数据的统计数据,例如我想计算每个日期的所有不同的“原因”,比如这个

    date1 reason1 -> 2        
    date1 reason2 -> 1
    date1 reason3 -> 1
    date2 reason1 -> 2
    date2 reason4 -> 1
    date2 reason2 -> 2
    date1 reason3 -> 1

......等等。我如何解析数据并构建所需的结果?我猜哈希会被使用,但我无法想象一种方法来解决这个问题。

3 个答案:

答案 0 :(得分:2)

这是一个非常直接的方法:

h = Hash.new(0)
File.foreach("foo.txt") do |line|
  h[line.chomp] += 1  
end  
h
#=> {"date1 reason1"=>2,
 "date1 reason2"=>1,
 "date1 reason3"=>1,
 "date2 reason4"=>1,
 "date2 reason1"=>2,
 "date2 reason2"=>2,
 "date2 reason3"=>1,
 "date3 reason4"=>2,
 "date3 reason1"=>1}

答案 1 :(得分:1)

str = "date1 reason1
date1 reason1
date1 reason2
date1 reason3
date2 reason4
date2 reason1
date2 reason2
date2 reason2
date2 reason1
date2 reason3
date3 reason4
date3 reason4
date3 reason1"

line_counts = Hash.new(0)

str.lines.each do |line|
  line_counts[line.chomp] += 1
end

line_counts.each do |line, count|
  puts "#{line} -> #{count}"
end

输出:

date1 reason1 -> 2
date1 reason2 -> 1
date1 reason3 -> 1
date2 reason1 -> 2
date2 reason2 -> 2
date2 reason3 -> 1
date2 reason4 -> 1
date3 reason1 -> 1
date3 reason4 -> 2

答案 2 :(得分:0)

result = File.foreach("foo.txt").each_with_object(Hash.new(0)) do |line,h|
  h[line.chomp] += 1  
end  
#=> {"date1 reason1"=>2,
 "date1 reason2"=>1,
 "date1 reason3"=>1,
 "date2 reason4"=>1,
 "date2 reason1"=>2,
 "date2 reason2"=>2,
 "date2 reason3"=>1,
 "date3 reason4"=>2,
 "date3 reason1"=>1}