删除文本文件中的重复行 - 除了它包含“{”或“}”

时间:2012-10-10 08:57:40

标签: python

我有一个非常大的文本文件,内容如:

@INBOOK{Ackermann1999-b, 
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. 
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, 
  year = {1980}, 
  timestamp = {1995-12-02} 
}      

我想删除除包含括号{或}的这些行之外的重复行。 结果应如下所示:

@INBOOK{Ackermann1999-b, 
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, 
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann}, 
  year = {1980}, 
  timestamp = {1995-12-02} 
} 

我遇到了这个Python-Skript,感谢Vinay Sajip:

lines_seen = set() # holds lines already seen 
outfile = open("literatur_clean.txt", "w") 
for line in open("literatur_dupl.txt", "r"): 
    if line not in lines_seen: # not a duplicate 
        outfile.write(line) 
        lines_seen.add(line) 
outfile.close() 

但它也会删除带有右括号的行和具有相同authordata的行。 因此我需要括号的条件。

有人能指出我加入这个条件吗?

提前致谢,

1 个答案:

答案 0 :(得分:2)

if ('{' in line or '}' in line) and line not in lines_seen: # not a duplicate