之前我编写了从多个文件中提取特定字符串的代码,结果存储在一个单独的文件中。现在这个文件有重复的结果我需要删除。
import glob
import re
import os.path
path=r"H:\sample"
file_array=glob.glob(os.path.join(path,'*.txt'))
with open("aiq_hits.txt","w") as out_file;
for input_filename in file_array:
with open(input_filename) as in_file:
for line in in_file:
match=re.findall(r"""(?<=')[^']*\.aiq(?=')|(?<=")[^"]*\.aiq(?=")""")
for item in match:
out_file.write("%s\n" %item)
out_file.close()
这个out_file有重复的结果我需要删除,结果应该是同一个文件
答案 0 :(得分:1)
readlines
将返回文件内容中的行列表。lines
。line
。new_lines
。line
列表中的new_lines
。new_lines
写入文件。演示:
input_file = "input.txt"
with open(input_file, "rb") as fp:
lines = fp.readlines()
new_lines = []
for line in lines:
#- Strip white spaces
line = line.strip()
if line not in new_lines:
new_lines.append(line)
output_file = "output.txt"
with open(output_file, "wb") as fp:
fp.write("\n".join(new_lines))