我尝试使用恒定模式“ beginPattern”和“ endPattern”从txt文件中提取数据,它们之间只有line1和line2中的键值是索引,然后才能在任何行中找到要提取的gettting值(key = value;)
[BEGIN_PATTERN]
line1=abd;
line2=ZXY;
...
line43=454;
...
...
[END_PATTERN]
[BEGIN_PATTERN]
line1=abc;
line2=ZXC;
...
line72=847;
...
[END_PATTERN]
[BEGIN_PATTERN]
line1=abe;
line2=ZXV;
...
line33=135;
...
[END_PATTERN]
[BEGIN_PATTERN]
line1=abt;
line2=ZXF;
...
line54=734;
...
[END_PATTERN]
预期结果是:
abd,ZXY,aaa,454,ggg,ggs
abc,ZXC,mgf,847,jde,g3e
abe,ZXV,ytd,135,dfs,jhf
abt,ZXF,ytf,734,ytd,hge
我尝试使用python脚本和re.match
,它只在输出文件中读取和写入值abd,ZXY
到找到的第一个beginPattern和endPattern
import re
START_PATTERN = '<BEGIN'
END_PATTERN = '<BEND'
with open('DB_example.txt') as file:
match = False
newfile = None
for line in file:
if re.match(START_PATTERN, line):
match = True
newfile = open('my_new_file.txt', 'w')
continue
elif re.match(END_PATTERN, line):
match = False
newfile.close()
continue
elif match:
#remove TAB and BreakLine
valor=line.rstrip().replace('\t','')
#split Key and value
(key, val) = valor.split('=')
if re.match('line1',key):
match = True
#before write into file remove ";"
newfile.write(val.replace(';',''))
continue
elif re.match('line2',key):
match:False
newfile.write(','+val.replace(';', ''))
continue
elif re.match('lineXX',key):
match:False
newfile.write(','+val.replace(';', ''))
continue
elif re.match('lineYY',key):
match:False
newfile.write(','+val.replace(';', ''))
continue
它不会继续第二,第三和其他模式。我的文件至少有30万个匹配项。 感谢您的帮助。
答案 0 :(得分:1)
每次打开文件,写之后都关闭文件。 因此,打开文件后,每次newfile.write都会覆盖前一个。
如果要将新的val添加到文件中,请尝试在写入任何内容之前只打开一次文件,而在写入所有值之后应关闭文件。
import re
START_PATTERN = '<BEGIN'
END_PATTERN = '<BEND'
newfile = open('my_new_file.txt', 'w')
with open('DB_example.txt') as file:
match = False
for line in file:
if re.match(START_PATTERN, line):
match = True
continue
elif re.match(END_PATTERN, line):
match = False
continue
elif match:
#remove TAB and BreakLine
valor=line.rstrip().replace('\t','')
#split Key and value
(key, val) = valor.split('=')
if re.match('line1',key):
match = True
#before write into file remove ";"
newfile.write(val.replace(';',''))
continue
elif re.match('line2',key):
match:False
newfile.write(','+val.replace(';', ''))
continue
elif re.match('lineXX',key):
match:False
newfile.write(','+val.replace(';', ''))
continue
elif re.match('lineYY',key):
match:False
newfile.write(','+val.replace(';', ''))
continue
newfile.close()