使用ruby在文本文件中查找单词或短语捕获单词跳过一行然后读取该行直到空白(重复)
这是以前帖子的变体,用正则表达式回答我想看看是否可以使用正则表达式完成。以下是文本示例:
MATCH ME 1234
3940393 $100.00 FORTY THOUSAND THIEVES
3455 $ 00.10 ONLY 1% OF THE THIEVES
GOBBLEY GOOK: 344959904 3948820 333333333
MATCH ME
3940321 $110.00 FORTY THOUSAND RICHER PEOPLE
3 $ 00.11 ONLY 1% OF THE RICHER PEOPLE
我想要的输出是:
MATCH ME,1234,3940393,$100.00,FORTY THOUSAND THIEVES
MATCH ME,1234,3455,$00.10,ONLY 1% OF THE THIEVES
MATCH ME,,3940393,$110.00,FOURTY THOUSAND RICHER PEOPLE
MATCH ME,,3,$00.11,ONLY 1% OF THE RICHER PEOPLE
我下面的代码只能让我在那里的一部分。它找到了匹配我,但只返回:
MATCH ME,1234,3940393 ,$100.00,FORTY THOUSAND THIEVES
MATCH ME,1234,3940393 ,$100.00,FORTY THOUSAND THIEVES
MATCH ME,not here,3940321 ,$110.00,FORTY THOUSAND RICHER PEOPLE
我确信我的方法是嵌套if的错误,但需要帮助替代方案:
def is_numeric?(object)
true if Float(object) rescue false
end
def is_match_me_line?(object)
true if object == "MATCH ME" rescue false
end
def load_file
raw_records = []
infile = File.open("match_me.txt", "r")
while line = infile.gets
possible_match_me = line[0,18]
match_me_words = line[4,8]
if is_match_me_line?(match_me_words)
possible_match_me_number_present = possible_match_me[13,4]
if is_numeric?(possible_match_me_number_present)
fis_match_me_number = possible_match_me_number_present
else fis_match_me_number = "not here"
end
line=infile.gets
line=infile.gets
account = line[0,8]
amount = line[9,7]
description = line[19,40]
record = [match_me_words, fis_match_me_number, account, amount,description]
raw_records << record
puts raw_records.map {|record| record*','}
end
end
end
load_file
正如所建议的那样,我正在尝试使用正则表达式解决方案,但我没有得到此代码所需的响应:
File.open("text_2.txt", "r").each_line do |data|
data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
lines.each_line do |line|
puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
end
end
end
答案 0 :(得分:2)
这是我的:
data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
lines.each_line do |line|
puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
end
end
那个正则表达式是丑陋的,但仍然比看30行更好。 (?:(?!\ n \ n)。)*表示匹配任何未跟随2个换行符的char。 (?:)是这样的,它也不会捕获'。'