所以我有一个列表L列表,我正在迭代尝试过滤掉重复项。现在我知道这不是最好的方法,但它的具体要求。我最终没有重复数据,但我最终得到了一些我无法解决的重复空列,有什么帮助吗?
for x in range(len(L), 0, -1):
x -= 1 #len()
for y in range(len(L[0]), 0, -1):
y -= 1
if y != 0 and y != 1: #Skiping Coloumns 0 and 1
check = L[x][y]
for x0 in range(len(L), 0, -1):
x0 -= 1
for y0 in range(len(L[0]), 0, -1):
y0 -= 1
if y0 == y:
checkagainst = L[x0][y0]
if check == checkagainst:
if x != x0: #If its on the same row, don't count bro
#print "Identical Indices:","X0:",x0,",","Y0:", y0,"|" ,"X:",x,",","Y:",y
#print L[x][y], "," , L[x0][y0]
WriteMe = True #Write to Not Duplicate file or not decider
if check == "": ##Didnt work
WriteMe = False
print x, ",", y
if WriteMe == True:
dwriter.writerow(L[x])
WriteMe = False #Set to False for next iteration
else:
writer.writerow(L[x])
L.pop(x)
print
示例输入:
ID, Sex, E-mail
1, M, lol@jk.com
2, F,
3, F,
4, F, jack@jay.com
预期输出(无重复文件):
Id, Sex, E-mail
1, M, lol@jk.com
2, F,
4, jack@jay.com
(在这种情况下,ID 2和ID 3可以互换,因为它们是重复的行)
预期输出(重复文件):
ID, Sex, E-mail
3, F,
答案 0 :(得分:0)
您可以使用collections.OrderedDict
:
>>> from collections import OrderedDict
with open('abc') as f:
#next(f) #skip header if present
for line in f:
data = map(str.strip, line.split(', '))
idx, sex, mail = data if len(data) == 3 else data+['']
dic.setdefault(mail,[]).append([idx,sex])
...
<强>非重复:强>
for k,v in dic.iteritems():
print ", ".join((v[0][0],v[0][1],k))
...
1, M, lol@jk.com
2, F,
4, F, jack@jay.com
<强>重复:强>
for k,v in dic.iteritems():
if len(v) >1:
for v1 in v[1:]:
print ", ".join((v1[0],v1[1],k))
...
3, F,,