如何在Ruby on Rails中解析文本文件

时间:2013-04-20 09:56:17

标签: ruby-on-rails ruby

以下是文本文件:

Old count: 56
S id: 1
M id: 1 
New count: 2
Old count: 56
S id: 1
M id: 2
New count: 20
Old count: 56
S id: 1
M id: 2
New count: 32
-----------------------------
Old count: 2
S id: 2
M id: 1
New count: 4
--------------------------------
.
.
.
.

我使用了分隔符" ---------------"对于每个ids。

如何解析值,使分隔符中的行与#34; -----"这是新计数添加如下:2+20+32 = 54

第一个块的哈希数组:count << {'new count' => 54},剩下的块等等。

我试过这样的事情..

begin
f=File.open("out2", "r")
f.each_line do |line|
@data+=line
end

s_rec=@data.split("------")
s_rec.each do |rec|
row_s=rec.split(/\n/)

row_s.each do |row|

  if r.include?"New count"
    rv=row.split(":")
    @db=rv[1]
  end

  end
  end

2 个答案:

答案 0 :(得分:1)

不确定您要尝试的输出格式,但请注明以下内容:

text = <<__
Old count: 56
S id: 1
M id: 1 
New count: 2
Old count: 56
S id: 1
M id: 2
New count: 20
Old count: 56
S id: 1
M id: 2
New count: 32
-----------------------------
Old count: 2
S id: 2
M id: 1
New count: 4
--------------------------------
.
.
.
.
__

这样:

text
.split(/^-{5,}/)
.map{|s| s.scan(/\bNew count: (\d+)/).map{|match| match.first.to_i}.inject(:+)}

给出:

[
  54,
  4,
  nil
]

<小时/> 在回复评论时,仍然不清楚你想要什么,因为你写的不是有效的Ruby对象,但是这个:

text
.scan(/^S id: (\d+).+?^New count: (\d+)/m)
.inject(Hash.new(0)){|h, (k, v)| h[k.to_i] += v.to_i; h}
.map{|k, v| {"S id" => k, "new count" => v}}

给出:

[
  {
    "S id"      => 1,
    "new count" => 54
  },
  {
    "S id"      => 2,
    "new count" => 4
  }
]

答案 1 :(得分:0)

我从:

开始
data = 'Old count: 56
S id: 1
M id: 1 
New count: 2
Old count: 56
S id: 1
M id: 2
New count: 20
Old count: 56
S id: 1
M id: 2
New count: 32
-----------------------------
Old count: 2
S id: 2
M id: 1
New count: 4
--------------------------------
'

ary = data.split("\n").slice_before(/^---/).map{ |a| a.select{ |s| s['New count:'] }.map{ |s| s[/\d+/].to_i }.inject(:+) }.compact

这给了我一个数组:

[
    [0] 54,
    [1] 4,
]

compact是必需的,因为当----发挥其魔力时,会有一个尾随的slice_before块分隔符导致一个空数组。

从那时起,很容易创建一个哈希数组:

Hash[ ary.map.with_index(1) { |v, i| ["S #{ i }", "new count #{ v }" ] } ]

看起来像:

{
    "S 1" => "new count 54",
    "S 2" => "new count 4"
}

将其分解,代码通过slice_before返回:

[
    [0] [
        [ 0] "--------------------------------",
        [ 1] "Old count: 56",
        [ 2] "S id: 1",
        [ 3] "M id: 1 ",
        [ 4] "New count: 2",
        [ 5] "Old count: 56",
        [ 6] "S id: 1",
        [ 7] "M id: 2",
        [ 8] "New count: 20",
        [ 9] "Old count: 56",
        [10] "S id: 1",
        [11] "M id: 2",
        [12] "New count: 32"
    ],
    [1] [
        [0] "-----------------------------",
        [1] "Old count: 2",
        [2] "S id: 2",
        [3] "M id: 1",
        [4] "New count: 4"
    ]
]

从那里可以直截了当地选择每个子阵列中所需的线条,提取出值,并使用inject对它们求和。

一旦完成,只需使用mapwith_index来构建字符串和名称/值对,然后让Hash将它们变成哈希值。