我有一个
形式的输入文件所有测试都以“测试”一词开头,所有错误都以“错误”一词开头
Test1
Error1
Error1
Error2
Test1
Error3
Test2
Error1
Error4
Test2
Error5
Error1
Test3
Error1
I want it in the format:
Test1
Error1
Error1
Error2
Error3 // Removed test1
Test2
Error1
Error4
Error5
Error1
Test3
Error1
基本上,在浏览文件时,它应删除重复的测试名,并以相同的顺序将其写入输出文件。 以下是我的代码
import os
import sys
import optparse
def delete_duplicate(inputfile,outputfile):
output = open(outputfile, "w")
from collections import OrderedDict
input = open(inputfile, "r")
lines = (line.strip() for line in input)
unique_lines = OrderedDict.fromkeys((line for line in lines if line))
for unique_line in unique_lines:
output.write(unique_line)
output.write("\n")
My code removes duplicate lines and gives result as below:
Test1
Error1
Error2
Error3
Test2
Error4
Error5
Test3
它可以正常使用测试名但不会出错。有人可以帮忙吗?
答案 0 :(得分:0)
您只需要保留一组中以Test
开头的行,并检查您是否已经将它写在输出文件中:
def delete_duplicate(inputfile,outputfile,seen={}):
with open(outputfile, "w") as output,open(inputfile, "r") as input:
for line in input:
if line not in seen:
output.write(line+'\n')
if line.startswith('Test'):
seen.add(line)
set
的优点是其订单为O(1),用于检查会员资格和添加项目。
答案 1 :(得分:0)
目前看起来您的代码只是将每行插入字典中,如果它之前没有遇到它。您似乎也想要为每个测试单独跟踪错误。您可以使用OrderedDict执行此操作,看起来有点像这样:
output_dict = {
'test1' : ['Error1','Error1','Error2','Error3'],
'test2' : ['Error1','Error4','Error5','Error1']
}
处理此问题的代码如下所示。
import os
import sys
import optparse
from collections import OrderedDict
def delete_duplicate(inputfile,outputfile):
# Declare the files and get the lines
outfile = open(outputfile, "w")
infile = open(inputfile, "r")
lines = (line.strip() for line in infile)
output_dict = OrderedDict()
currentTest = '' # Used to keep track of which test we are working with
for line in lines:
if line.startswith('Test'): # A new test is starting
currentTest = line
if currentTest not in output_dict:
output_dict[currentTest] = []
elif line.startswith('Error'): # Add the error to the current test
output_dict[currentTest].append(line)
for test in output_dict.keys():
outfile.write(test + '\n') # Write the test number
for error in output_dict[test]:
outfile.write(error + '\n') # Write the errors for that test
outfile.write('\n')