删除文件中的重复记录

时间:2011-11-25 09:18:30

标签: python

  

可能重复:
  How might I remove duplicate lines from a file?

我有一个包含要删除的重复记录的文件。这就是我试过的

import sys  

for line in sys.stdin:  
    line = line.rstrip()  
    line = line.split()  
    idlist = []   
    if idlist == []:  
        idlist = line[1]  
    else:  
    idlist.append(line[1])  
    print line[0], idlist  

#did无效

和此

for line in sys.stdin:  
    line = line.rstrip()  
    line = line.split()  
    lines_seen = set()  
    dup = line[1]  
    if dup not in lines_seen:  
        lines_seen = dup  
    else:  
        lines_seen.append(dup)  
    print line[0], lines_seen  

sys.stdin.close()

#did也不起作用!

这是输入的样子

BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444

这就是我希望输出看起来像

BLE 1234
BLE 1223
LLE 3456
BLE 4444
ELE 5555

谢谢! EDG

3 个答案:

答案 0 :(得分:3)

elem1_seen = set()                 # first initialize an empty set of seen elem[1]
lines_out = []                     # list of "unique" output lines
for line in sys.stdin:             # iterate over input
    elems = line.rstrip().split()  # split line into two elements
    if elems[1] not in elem1_seen: # if second element not seen before...
        lines_out.append(line)     # append the whole line to output
        elem1_seen.add(elems[1])   # add this second element to seen before set
print lines_out                    # print output

答案 1 :(得分:0)

主要问题是你正在改变变量类型,这有点混乱:

import sys  

for line in sys.stdin:  
    line = line.rstrip()   #Line is a string  
    line = line.split()    #Line is a list
    idlist = []            #idlist is a list
    if idlist == []:  
        idlist = line[1]   #id list is a string
    else:  
        idlist.append(line[1])  #and now?
    print line[0], idlist 

答案 2 :(得分:0)

import fileinput

ss = '''BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444 
'''
with open('klmp.txt','w') as f:
    f.write(ss)





seen = []
for line in fileinput.input('klmp.txt',inplace=1):
    b = line.split()[1]
    if b not in seen:
        seen.append(b)
        print line.strip()

在SO中搜索单词'fileinput',我找到了:

How to delete all blank lines in the file with the help of python?