如何在文本文件中查找表达式并处理所有行,直到下一次出现表达式并重复到文件结尾

时间:2016-09-19 14:51:13

标签: ruby parsing text

我有一个文本文件:

Some comment on the 1st line of the file.

processing date:         31.8.2016
amount:                  -1.23
currency:                EUR
balance:                 1234.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info:         Amount: 1.23 EUR 29.08.2016 Place: 123456789XY



processing date:         30.8.2016
amount:                  -2.23
currency:                EUR
balance:                 12345.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info:         Amount: 2.23 EUR 28.08.2016 Place: 123456789XY



processing date:         29.8.2016
amount:                  -3.23
currency:                EUR
balance:                 123456.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info:         Amount: 2.23 EUR 27.08.2016 Place: 123456789XY

我需要处理该文件,因此我会将右侧的值31.8.2016-1.23EUR1234.56等存储在MySQL数据库。

我只使用findfind_all返回包含特定字符串的行的1次出现或所有行,但这还不够,因为我不知何故需要识别以“处理日期:”并以“附加信息:”结束并处理其中的值,然后处理下一个块,接下来,直到文件结束。

任何提示如何实现这一目标?

1 个答案:

答案 0 :(得分:1)

我从这开始:

File.foreach('data.txt', "\n\n") do |li|
  next unless li[/^processing/]
  puts "'#{li.strip}'"
end

如果“data.txt”包含您的内容,foreach将读取该文件并返回li中文本的段落,而不是行。一旦你有了那些,你可以根据需要操纵它们。这非常快速有效,并且没有readlines或任何基于read的I / O可能存在的可伸缩性问题。

这是输出:

'processing date:         31.8.2016
amount:                  -1.23
currency:                EUR
balance:                 1234.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info:         Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'
'processing date:         30.8.2016
amount:                  -2.23
currency:                EUR
balance:                 12345.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info:         Amount: 2.23 EUR 28.08.2016 Place: 123456789XY'
'processing date:         29.8.2016
amount:                  -3.23
currency:                EUR
balance:                 123456.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info:         Amount: 2.23 EUR 27.08.2016 Place: 123456789XY'

您可以通过包装'看到正在以"\n\n"描述的块或段落读取文件,然后剥离每个块以删除尾随空白。

有关详细信息,请参阅foreach文档。

split(':', 2)是你的朋友:

'processing date:         31.8.2016'.split(':', 2) # => ["processing date", "         31.8.2016"]
'amount:                  -1.23'.split(':', 2) # => ["amount", "                  -1.23"]
'currency:                EUR'.split(':', 2) # => ["currency", "                EUR"]
'balance:                 1234.56'.split(':', 2) # => ["balance", "                 1234.56"]
'payer reference:         /VS123456/SS0011223344/KS1212'.split(':', 2) # => ["payer reference", "         /VS123456/SS0011223344/KS1212"]
'type of the transaction: Some type of the transaction 1'.split(':', 2) # => ["type of the transaction", " Some type of the transaction 1"]
'additional info:         Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'.split(':', 2) # => ["additional info", "         Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"]

你可以这样做:

text = 'processing date:         31.8.2016
amount:                  -1.23
currency:                EUR
balance:                 1234.56
payer reference:         /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info:         Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'

text.lines.map{ |li| li.split(':', 2).map(&:strip) }.to_h
# => {"processing date"=>"31.8.2016", "amount"=>"-1.23", "currency"=>"EUR", "balance"=>"1234.56", "payer reference"=>"/VS123456/SS0011223344/KS1212", "type of the transaction"=>"Some type of the transaction 1", "additional info"=>"Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"}

有很多方法可以继续将信息解析为更有用的数据,但这可以让你弄明白。