我有一个清单清单。每个子列表包含2个项目,并且对于n次出现,子列表的第二个项目是相同的。
我只想保留第一个子列表,因为传播在第一个列表中最大。这是我所拥有的:
[[0, 3],
[1, 3],
[2, 3],
[314, 335],
[315, 335],
[316, 335],
[317, 335],
[318, 335],
[319, 335],
[320, 335],
[321, 335],
[322, 335],
[323, 335],
[324, 335],
[325, 335],
[326, 335],
[327, 335],
[328, 335],
[329, 335],
[330, 335],
[331, 335],
[332, 335],
[333, 335],
[334, 335],
[645, 647],
[646, 647]]
我想保留:
[[0, 3],
[314, 335],
[645, 647]]
有关如何操作的任何想法?
答案 0 :(得分:1)
这是一种方法。
例如:
seen = set()
result = []
for i in data:
if i[1] not in seen: #Check if second item in set
result.append(i) #Add to result
seen.add(i[1]) #Add second item to set
print(result) #--> [[0, 3], [314, 335], [645, 647]]
答案 1 :(得分:1)
itertools.groupby
可以使用:
from itertools import groupby
ret = [[next(group)[0], key] for key, group in groupby(lst, key=lambda x: x[1])]
# [[0, 3], [314, 335], [645, 647]]
我在您的子列表中将第二个元素用作key
。
答案 2 :(得分:1)
另一种方法是使用熊猫数据框
import pandas as pd
df = pd.DataFrame(your_data)
df2 = df.drop_duplicates(1)
然后可以转换回列表的数据框。
答案 3 :(得分:0)
itertools docs中有可用于该任务的现成配方:
import itertools
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
data = [[0, 3],
[1, 3],
[2, 3],
[314, 335],
[315, 335],
[316, 335],
[317, 335],
[318, 335],
[319, 335],
[320, 335],
[321, 335],
[322, 335],
[323, 335],
[324, 335],
[325, 335],
[326, 335],
[327, 335],
[328, 335],
[329, 335],
[330, 335],
[331, 335],
[332, 335],
[333, 335],
[334, 335],
[645, 647],
[646, 647]]
out = list(unique_everseen(data,key=lambda x:x[1]))
print(out)
输出:
[[0, 3], [314, 335], [645, 647]]