使用ruby在文本文件中查找单词或短语捕获单词跳过一行然后读取该行直到空白(重复)

时间:2012-05-15 01:48:19

标签: ruby parsing text

使用ruby在文本文件中查找单词或短语捕获单词跳过一行然后读取该行直到空白(重复)

这是以前帖子的变体,用正则表达式回答我想看看是否可以使用正则表达式完成。以下是文本示例:

  MATCH ME 1234

3940393  $100.00   FORTY THOUSAND THIEVES
3455     $ 00.10   ONLY 1% OF THE THIEVES

GOBBLEY GOOK: 344959904       3948820   333333333

MATCH ME

3940321  $110.00   FORTY THOUSAND RICHER PEOPLE
3        $ 00.11   ONLY 1% OF THE RICHER PEOPLE

我想要的输出是:

MATCH ME,1234,3940393,$100.00,FORTY THOUSAND THIEVES
MATCH ME,1234,3455,$00.10,ONLY 1% OF THE THIEVES
MATCH ME,,3940393,$110.00,FOURTY THOUSAND RICHER PEOPLE
MATCH ME,,3,$00.11,ONLY 1% OF THE RICHER PEOPLE

我下面的代码只能让我在那里的一部分。它找到了匹配我,但只返回:

MATCH ME,1234,3940393 ,$100.00,FORTY THOUSAND THIEVES
MATCH ME,1234,3940393 ,$100.00,FORTY THOUSAND THIEVES
MATCH ME,not here,3940321 ,$110.00,FORTY THOUSAND RICHER PEOPLE

我确信我的方法是嵌套if的错误,但需要帮助替代方案:

def is_numeric?(object)
  true if Float(object) rescue false
end


def is_match_me_line?(object)
true if object == "MATCH ME" rescue false
end

 def load_file
 raw_records = []
infile = File.open("match_me.txt", "r") 
while line = infile.gets

 possible_match_me = line[0,18]
  match_me_words = line[4,8]


 if is_match_me_line?(match_me_words)

 possible_match_me_number_present = possible_match_me[13,4]   
  if is_numeric?(possible_match_me_number_present)  
   fis_match_me_number = possible_match_me_number_present
   else fis_match_me_number = "not here"  
 end          

line=infile.gets  
line=infile.gets

account = line[0,8] 
amount =  line[9,7] 
description = line[19,40]
record = [match_me_words, fis_match_me_number, account, amount,description]  
raw_records << record
puts raw_records.map {|record| record*','} 

end    
end


end
load_file

正如所建议的那样,我正在尝试使用正则表达式解决方案,但我没有得到此代码所需的响应:

File.open("text_2.txt", "r").each_line do |data|

data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
lines.each_line do |line|
puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
end  
end
end

1 个答案:

答案 0 :(得分:2)

这是我的:

data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
  lines.each_line do |line|
    puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
  end  
end

那个正则表达式是丑陋的,但仍然比看30行更好。 (?:(?!\ n \ n)。)*表示匹配任何未跟随2个换行符的char。 (?:)是这样的,它也不会捕获'。'