Question

我有一个元组列表。每个元组都是一个键值对，其中键是一个数字，值是一串字符。对于每个键，我需要以列表形式返回前两个字符及其计数。

例如，给定列表

[(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]

键是1和2，值是

"aabbc", "babdea", ..., "acdaad"

元组可以转换为格式

的元组

(1, {"a":2, "b":2, "c":1}),(1,{"a":2, "b":2, "d":1,"e":1})...(2,{"a":2, "c":1, "d":2})

对于键1，组合元组将是

(1,{"a":4, "b":4, "c":1, "d":1,"e":1})

所以带有计数的前两个字符将是

[("a",4),("b",4)]

每个键都会重复该过程

我能够获得所需的输出，但我正在寻找更好的解决方案

from collections import Counter
l=[(x[0],list(x[1])) for x in [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]]
l2=[(y[0],Counter(y[1])) for y in l]

l3=[(x[0][1],x[1][1]) for x in it.combinations(l2,2) if x[0][0]==x[1][0]  ]

l4=[]
for t,y in l3:
    d={}
    l5=list(set(t.keys()).union(y.keys()))
    for i in l5:
        d[i]=t[i]+y[i]
    d_sort=sorted(d.items(), key=lambda x: x[1], reverse=True)[:2]

    l4.append(d_sort)


print l4
[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

Answer 1

您还可以使用相同的键连接de字符串，然后计算字符并提取两个最常见的字符：

import collections

data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]

groups = collections.defaultdict(str)
for i, s in data: 
   groups[i] += s 

print([collections.Counter(string).most_common(2)
       for string in groups.values()])

你会得到：

[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

Answer 2

我使用defaultdict来保存Counter，这些>>> from collections import Counter, defaultdict >>> data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] >>> >>> result = defaultdict(Counter) >>> for num, letters in data: ... result[num].update(letters) ... >>> result defaultdict(<class 'collections.Counter'>, {1: Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'd': 1}), 2: Counter({'a': 5, 'c': 3, 'd': 2, 'b': 1})})在迭代您的元组列表时会更新：

Counter

为了获得两个最常见的字母，most_common个对象有一个有用的>>> {k:v.most_common(2) for k,v in result.items()} {1: [('a', 4), ('b', 4)], 2: [('a', 5), ('c', 3)]}方法。

DROP TABLE CASCADE

Answer 3

不是更好，但更短：

from itertools import groupby
from collections import Counter


lst = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]

[Counter(''.join(list(zip(*y[1]))[1])).most_common(2) for y in groupby(lst, key=lambda x: x[0])]

# [[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

我希望这会有所帮助。

Answer 4

如果您的列表没有排序，我会这样做：

from collections import Counter 
di={}
for i, s in data:
    di.setdefault(i, Counter())
    di[i]+=Counter(s)

print [c.most_common(2) for _,c in sorted(di.items())]

如果已经排序，您可以使用groupby和reduce：

from itertools import groupby 
li=[]
for k, g in groupby(data, key=lambda t: t[0]):
    li.append(reduce(lambda x,y: x+y, (Counter(t[1]) for t in g)).most_common(2))

print li

无论是哪种情况，打印：

[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

从元组组合中获得最高计数

4 个答案: