我有一个包含要删除的重复记录的文件。这就是我试过的
import sys
for line in sys.stdin:
line = line.rstrip()
line = line.split()
idlist = []
if idlist == []:
idlist = line[1]
else:
idlist.append(line[1])
print line[0], idlist
for line in sys.stdin:
line = line.rstrip()
line = line.split()
lines_seen = set()
dup = line[1]
if dup not in lines_seen:
lines_seen = dup
else:
lines_seen.append(dup)
print line[0], lines_seen
sys.stdin.close()
这是输入的样子
BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444
BLE 1234
BLE 1223
LLE 3456
BLE 4444
ELE 5555
谢谢! EDG
答案 0 :(得分:3)
elem1_seen = set() # first initialize an empty set of seen elem[1]
lines_out = [] # list of "unique" output lines
for line in sys.stdin: # iterate over input
elems = line.rstrip().split() # split line into two elements
if elems[1] not in elem1_seen: # if second element not seen before...
lines_out.append(line) # append the whole line to output
elem1_seen.add(elems[1]) # add this second element to seen before set
print lines_out # print output
答案 1 :(得分:0)
主要问题是你正在改变变量类型,这有点混乱:
import sys
for line in sys.stdin:
line = line.rstrip() #Line is a string
line = line.split() #Line is a list
idlist = [] #idlist is a list
if idlist == []:
idlist = line[1] #id list is a string
else:
idlist.append(line[1]) #and now?
print line[0], idlist
答案 2 :(得分:0)
import fileinput
ss = '''BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444
'''
with open('klmp.txt','w') as f:
f.write(ss)
seen = []
for line in fileinput.input('klmp.txt',inplace=1):
b = line.split()[1]
if b not in seen:
seen.append(b)
print line.strip()
在SO中搜索单词'fileinput',我找到了:
How to delete all blank lines in the file with the help of python?