过滤元组中最长项目的元组列表

时间:2018-04-10 18:45:24

标签: python

说我有这个数据

my_list_of_tuples = [
    ('bill', [(4, ['626']), (4, ['253', '30', '626']),
              (4, ['253', '30', '626']), (4, ['626']),
              (4, ['626']), (4, ['626'])]),
    ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']),
               (2, ['6']), (2, ['6']), (2, ['6'])]),
    ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])
]

我想要保留子元组列表元素中最长的所有项目,并删除重复项,以便我留下

my_output_list_of_tuples = [
    ('bill',  [(4, ['253', '30', '626'])]),
    ('sarah',  [(2, ['2', '6'])]),
    ('fred',  [(1, ['6']), (1, ['2'])])]

到目前为止,我试过

my_output_list_of_tuples = [(x[0], max(x[1], key=lambda tup: len(tup[1]))) for x in my_list_of_tuples] 

但这对fred不起作用,因为max函数只返回一个项目。我也尝试了几次地图尝试和lamba,但不太远。

我可以将其分解为

for my_list_of_tuples_by_person_name in my_list_of_tuples:
    #Do something with my_list_of_tuples_by_person_name[1]

有什么想法吗?

提前致谢:)

3 个答案:

答案 0 :(得分:2)

如果您想保留这样的重复项,则无法拨打max,您必须将每个值与max的结果进行比较。

最可读的方法是构建一个dict映射键到最大长度,然后将每个元组与之比较:

result = []
for name, sublist in my_list_of_tuples:
    d = {}
    for key, subsub in sublist:
        if len(subsub) > d.get(key, 0):
            d[key] = len(subsub)
    lst =[(key, subsub) for key, subsub in sublist if len(subsub) == d[key]]
    result.append((name, lst))

你可以压缩大部分内容,但它可能只会使事情变得更不透明,更难以维护。并注意到将双通循环压缩为单个表达式(每次计算max)的天真方法将其转换为嵌套(二次)循环,因此它将比您更加冗长认为

既然你已经完全改变了问题,现在显然只想要最长的子列表(可能是在有重复的情况下任意选择,或者是非重复但相同长度的值?),事情就更简单了:

result = []
for name, sublist in my_list_of_tuples:
    keysubsub = max(sublist, key=lambda keysubsub: len(keysubsub[1]))
    result.append((name, keysubsub))

但这基本上就是你已经拥有的。你说它的问题是“......但这对fred不起作用,因为max函数只返回一个项目”,但我不确定你想要的是什么而不是一个项目。

如果您要查找的是所有不同最大长度的列表,则可以使用setOrderedSet代替list第一个答案。 stdlib中没有OrderedSet,但this recipe by Raymond Hettinger对我们来说应该没问题。但是,让我们使用集合和列表手动完成:

result = []
for name, sublist in my_list_of_tuples:
    d = {}
    for key, subsub in sublist:
        if len(subsub) > d.get(key, 0):
            d[key] = len(subsub)
    lst, seen = [], set()
    for key, subsub in sublist:
        if len(subsub) == d[key] and tuple(subsub) not in seen:
            seen.add(tuple(subsub))
            lst.append((key, subsub))
    result.append((name, lst))

我认为最后一个提供了更新后问题的输出,并且没有做任何难以理解的事情。

答案 1 :(得分:1)

您可以使用max

my_list_of_tuples = my_list_of_tuples = [('bill', [(4, ['626']), (4, ['253', '30', '626']), (4, ['253', '30', '626']), (4, ['626']), (4, ['626']), (4, ['626'])]), ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']), (2, ['6']), (2, ['6']), (2, ['6'])]), ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])]
final_result = [(a, [(c, d) for c, d in b if len(d) == max(map(len, [h for _, h in b]))]) for a, b in my_list_of_tuples]
new_result = [(a, [c for i, c in enumerate(b) if c not in b[:i]]) for a, b in final_result]

输出:

[('bill', [(4, ['253', '30', '626'])]), ('sarah', [(2, ['2', '6'])]), ('fred', [(1, ['6']), (1, ['2'])])]

答案 2 :(得分:1)

首先定义一个函数

def f(ls):
    max_length = max(len(y) for (x, y) in ls)

    result = []

    for (x, y) in ls:
        if len(y) == max_length and (x, y) not in result:
            result.append((x, y))

    return result

现在称之为

>>> from pprint import pprint
>>> pprint([(name, f(y)) for name, y in my_list_of_tuples])
[('bill', [(4, ['253', '30', '626'])]),
 ('sarah', [(2, ['2', '6'])]),
 ('fred', [(1, ['6']), (1, ['2'])])]