我有一个元组列表。每个元组都是一个键值对,其中键是一个数字,值是一串字符。对于每个键,我需要以列表形式返回前两个字符及其计数。
例如,给定列表
[(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]
键是1和2,值是
"aabbc", "babdea", ..., "acdaad"
元组可以转换为格式
的元组(1, {"a":2, "b":2, "c":1}),(1,{"a":2, "b":2, "d":1,"e":1})...(2,{"a":2, "c":1, "d":2})
对于键1,组合元组将是
(1,{"a":4, "b":4, "c":1, "d":1,"e":1})
所以带有计数的前两个字符将是
[("a",4),("b",4)]
每个键都会重复该过程
我能够获得所需的输出,但我正在寻找更好的解决方案
from collections import Counter
l=[(x[0],list(x[1])) for x in [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]]
l2=[(y[0],Counter(y[1])) for y in l]
l3=[(x[0][1],x[1][1]) for x in it.combinations(l2,2) if x[0][0]==x[1][0] ]
l4=[]
for t,y in l3:
d={}
l5=list(set(t.keys()).union(y.keys()))
for i in l5:
d[i]=t[i]+y[i]
d_sort=sorted(d.items(), key=lambda x: x[1], reverse=True)[:2]
l4.append(d_sort)
print l4
[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]
答案 0 :(得分:2)
您还可以使用相同的键连接de字符串,然后计算字符并提取两个最常见的字符:
import collections
data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]
groups = collections.defaultdict(str)
for i, s in data:
groups[i] += s
print([collections.Counter(string).most_common(2)
for string in groups.values()])
你会得到:
[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]
答案 1 :(得分:0)
我使用defaultdict
来保存Counter
,这些>>> from collections import Counter, defaultdict
>>> data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]
>>>
>>> result = defaultdict(Counter)
>>> for num, letters in data:
... result[num].update(letters)
...
>>> result
defaultdict(<class 'collections.Counter'>, {1: Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'd': 1}), 2: Counter({'a': 5, 'c': 3, 'd': 2, 'b': 1})})
在迭代您的元组列表时会更新:
Counter
为了获得两个最常见的字母,most_common
个对象有一个有用的>>> {k:v.most_common(2) for k,v in result.items()}
{1: [('a', 4), ('b', 4)], 2: [('a', 5), ('c', 3)]}
方法。
DROP TABLE CASCADE
答案 2 :(得分:0)
不是更好,但更短:
from itertools import groupby
from collections import Counter
lst = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]
[Counter(''.join(list(zip(*y[1]))[1])).most_common(2) for y in groupby(lst, key=lambda x: x[0])]
# [[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]
我希望这会有所帮助。
答案 3 :(得分:0)
如果您的列表没有排序,我会这样做:
from collections import Counter
di={}
for i, s in data:
di.setdefault(i, Counter())
di[i]+=Counter(s)
print [c.most_common(2) for _,c in sorted(di.items())]
如果已经排序,您可以使用groupby
和reduce
:
from itertools import groupby
li=[]
for k, g in groupby(data, key=lambda t: t[0]):
li.append(reduce(lambda x,y: x+y, (Counter(t[1]) for t in g)).most_common(2))
print li
无论是哪种情况,打印:
[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]