我有一个非常大的文本文件,内容如:
@INBOOK{Ackermann1999-b,
author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann},
year = {1980},
timestamp = {1995-12-02}
}
我想删除除包含括号{或}的这些行之外的重复行。 结果应如下所示:
@INBOOK{Ackermann1999-b,
author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann},
year = {1980},
timestamp = {1995-12-02}
}
我遇到了这个Python-Skript,感谢Vinay Sajip:
lines_seen = set() # holds lines already seen
outfile = open("literatur_clean.txt", "w")
for line in open("literatur_dupl.txt", "r"):
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()
但它也会删除带有右括号的行和具有相同authordata的行。 因此我需要括号的条件。
有人能指出我加入这个条件吗?
提前致谢,
答案 0 :(得分:2)
if ('{' in line or '}' in line) and line not in lines_seen: # not a duplicate