来自具有以下结构的文件:
..............................
Delimiter [1]
..............................
blablabla
..............................
Delimiter CEO [2]
..............................
blabla
..............................
Delimiter [3]
..............................
[...]
..............................
Delimiter CEO [n-1]
..............................
blablabla
..............................
Delimiter [n]
..............................
我写了一段代码,提取了所有定界符,但也提取了一些我不需要的行。我不需要的那些行会导致my code不能正确运行。 我想在新的.txt文件中保存一行,如果该行中有正则表达式“ [a number]”。因此,为了更精确地提取,我使用re:在python中编写了此代码(紧跟this answer之后):
import re
with open('testoestratto.txt','r',encoding='UTF-8') as myFile:
text = myFile.readlines()
text = [frase.rstrip('\n') for frase in text]
regex = r'\[\d+\]'
new_file=[]
for lines in text:
match = re.search(regex, lines, re.MULTILINE)
if match:
new_line = match.group() + '\n'
new_file.append(new_line)
with open('prova.txt', 'w') as f:
f.seek(0)
f.writelines(new_file)
但是,在'prova.txt'文件中,我只能找到正则表达式,因此我有一个带有[1],[2],... [n-1],[n]的文件。
答案 0 :(得分:1)
您的new_file
是文件中找到的匹配项的列表(您用match.group()
+换行符填充。)
您可以检查一行中是否有\[\d+]
个匹配项,并将该行输出到新文件中:
import re
reg = re.compile(r'\[\d+]') # Matches a [ char, followed with 1+ digits and then ]
with open('prova.txt', 'w') as f: # open file for writing
with open('testoestratto.txt','r',encoding='UTF-8') as myFile: # open file for reading
for line in myFile: # read myFile line by line
if reg.search(line): # if there is a match anywhere in a line
f.write(line) # write the line into the new file