说我有这个数据
my_list_of_tuples = [
('bill', [(4, ['626']), (4, ['253', '30', '626']),
(4, ['253', '30', '626']), (4, ['626']),
(4, ['626']), (4, ['626'])]),
('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']),
(2, ['6']), (2, ['6']), (2, ['6'])]),
('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])
]
我想要保留子元组列表元素中最长的所有项目,并删除重复项,以便我留下
my_output_list_of_tuples = [
('bill', [(4, ['253', '30', '626'])]),
('sarah', [(2, ['2', '6'])]),
('fred', [(1, ['6']), (1, ['2'])])]
到目前为止,我试过
my_output_list_of_tuples = [(x[0], max(x[1], key=lambda tup: len(tup[1]))) for x in my_list_of_tuples]
但这对fred不起作用,因为max函数只返回一个项目。我也尝试了几次地图尝试和lamba,但不太远。
我可以将其分解为
for my_list_of_tuples_by_person_name in my_list_of_tuples:
#Do something with my_list_of_tuples_by_person_name[1]
有什么想法吗?
提前致谢:)
答案 0 :(得分:2)
如果您想保留这样的重复项,则无法拨打max
,您必须将每个值与max
的结果进行比较。
最可读的方法是构建一个dict映射键到最大长度,然后将每个元组与之比较:
result = []
for name, sublist in my_list_of_tuples:
d = {}
for key, subsub in sublist:
if len(subsub) > d.get(key, 0):
d[key] = len(subsub)
lst =[(key, subsub) for key, subsub in sublist if len(subsub) == d[key]]
result.append((name, lst))
你可以压缩大部分内容,但它可能只会使事情变得更不透明,更难以维护。并注意到将双通循环压缩为单个表达式(每次计算max
)的天真方法将其转换为嵌套(二次)循环,因此它将比您更加冗长认为
既然你已经完全改变了问题,现在显然只想要最长的子列表(可能是在有重复的情况下任意选择,或者是非重复但相同长度的值?),事情就更简单了:
result = []
for name, sublist in my_list_of_tuples:
keysubsub = max(sublist, key=lambda keysubsub: len(keysubsub[1]))
result.append((name, keysubsub))
但这基本上就是你已经拥有的。你说它的问题是“......但这对fred不起作用,因为max函数只返回一个项目”,但我不确定你想要的是什么而不是一个项目。
如果您要查找的是所有不同最大长度的列表,则可以使用set
或OrderedSet
代替list
第一个答案。 stdlib中没有OrderedSet
,但this recipe by Raymond Hettinger对我们来说应该没问题。但是,让我们使用集合和列表手动完成:
result = []
for name, sublist in my_list_of_tuples:
d = {}
for key, subsub in sublist:
if len(subsub) > d.get(key, 0):
d[key] = len(subsub)
lst, seen = [], set()
for key, subsub in sublist:
if len(subsub) == d[key] and tuple(subsub) not in seen:
seen.add(tuple(subsub))
lst.append((key, subsub))
result.append((name, lst))
我认为最后一个提供了更新后问题的输出,并且没有做任何难以理解的事情。
答案 1 :(得分:1)
您可以使用max
:
my_list_of_tuples = my_list_of_tuples = [('bill', [(4, ['626']), (4, ['253', '30', '626']), (4, ['253', '30', '626']), (4, ['626']), (4, ['626']), (4, ['626'])]), ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']), (2, ['6']), (2, ['6']), (2, ['6'])]), ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])]
final_result = [(a, [(c, d) for c, d in b if len(d) == max(map(len, [h for _, h in b]))]) for a, b in my_list_of_tuples]
new_result = [(a, [c for i, c in enumerate(b) if c not in b[:i]]) for a, b in final_result]
输出:
[('bill', [(4, ['253', '30', '626'])]), ('sarah', [(2, ['2', '6'])]), ('fred', [(1, ['6']), (1, ['2'])])]
答案 2 :(得分:1)
首先定义一个函数
def f(ls):
max_length = max(len(y) for (x, y) in ls)
result = []
for (x, y) in ls:
if len(y) == max_length and (x, y) not in result:
result.append((x, y))
return result
现在称之为
>>> from pprint import pprint
>>> pprint([(name, f(y)) for name, y in my_list_of_tuples])
[('bill', [(4, ['253', '30', '626'])]),
('sarah', [(2, ['2', '6'])]),
('fred', [(1, ['6']), (1, ['2'])])]