我正在编辑我之前的帖子,因为我取得了一些进展,但现在有点卡住了:
文本文件示例如下。我现在可以读取文件做一些解析来获取我需要的数据并输出文件。但是,输出将数据放在单独的行上,我需要将输出文件(name,expiry date,last_used,address1,address2,city,state,zip)放在用逗号分隔的一行上。
到目前为止,这是代码:
def is_numeric?(object)
true if Float(object) rescue false
end
def load_file
raw_records = []
infile = File.open("testfile.txt", "r")
#counter =1
while line = infile.gets
possible_account_number = line[0,16]
if is_numeric?(possible_account_number)
account_number= possible_account_number[5,11]
name = line[21,27].strip.gsub(/\,/,"")
expire_date = line[108,8].strip
last_used = line[117,8].strip
line = infile.gets
line = infile.gets
address1 = line.strip.gsub(/\,/,"") #needed for some random commas
line = infile.gets
address2 = line.strip.gsub(/\,/,"")
line = infile.gets
city = line[21, 20].strip.gsub(/\,/,"")
state = line[42, 2]
zip = line[45, 5]
record = [name, expire_date,last_used, address1, address2, city, state, zip]
raw_records << record
#counter = counter + 1
end
end
infile.close
puts raw_records.map {|record| record*','}
File.open('test_w.txt', 'w') do |f2|
f2.puts raw_records.map {|record| record*','}
end
end
#the_string.gsub(/\,/,"")
load_file
以下是原始数据:
11111 ABC MOVINGABC, INC 1234567891 LISTINGS 02-06-12 MONDAY 2112-001-001 PAGE 1 1234 CUSTOMIA ROAD SUITE 12345 LIST MANAGEMENT NOSAOLOS NV 12345 STATEMENTS TISSUE STATEMENTS NAME 1 ABC TISSUES TISSUE ROAD LOC TISSUES PAGE ABC TISSUE STATEMENTS NAME 2 ADDRESS LINE 1 ADDRESS LINE 2 CITY ST ZIP TITLE TISSUE NUMBER: 123456789 1234567890000030 MARILYN SMITH 12345678911 05-30-12 01-28-12 1234 ST MARYS BLVD. SUITE B NOSAOLOS MI 12345 1234567890000048 MARILYN ACTIVITA 12345678911 05-30-12 09-04-11 1234 ST MARYS BOULEVARD STE. B NOSAOLOS OH 12345 1234567890000055 ANDREW WAYMENT 12345678911 05-30-12 01-12-12 123 S. DESCRIBE ST. NOSAOLOS OH 12345
这是完成的文本 - 在Jason的帮助下(谢谢):
MARILYN SMITH,5-30-12 ,1-28-12,1234 ST MARYS BLVD.,SUITE B,NOSAOLOS,MI,12345 MARILYN ACTIVITA,5-30-12 ,9-04-11,1234 ST MARYS BOULEVARD,STE. B,NOSAOLOS,OH,12345 ANDREW WAYMENT,5-30-12 ,1-12-12,123 S. DESCRIBE ST.,,NOSAOLOS,OH,12345
我还想将它保存到文件中,我使用了这个:
File.open('test_w.txt', 'w') do |f2|
f2.puts raw_records.map {|record| record*','}
end
安德鲁
答案 0 :(得分:0)
如果没有输入文件,很难给你任何代码作为例子,但是文件的图像看起来相当可预测,所以带有一些RegExp魔法的状态跟踪器应该可以解决问题。
文件看起来以制表符分隔,因此您可以按标签分割行:
File.open('filename', 'r') do |file|
lines = file.inject([]){|memo, line| memo.push line.split(/\t/)}
# Now you have an array of arrays that you can parse with a state tracker
end
您的状态跟踪器只会跟踪您最后输入的内容,例如号码,名称或日期_发布,然后填充正确的值。