在collections.Counter
中,方法most_common(n)
仅返回列表中n个最常用的项目。我需要的确如此,但我也需要包括相同的数量。
from collections import Counter
test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
-->Counter({'A': 3, 'C': 2, 'B': 2, 'D': 2, 'E': 1, 'G': 1, 'F': 1, 'H': 1})
test.most_common(2)
-->[('A', 3), ('C', 2)
我需要[('A', 3), ('B', 2), ('C', 2), ('D', 2)]
因为在这种情况下它们具有与n = 2相同的计数。我的真实数据是关于DNA代码的,可能非常大。我需要它有点效率。
答案 0 :(得分:5)
您可以这样做:
from itertools import takewhile
def get_items_upto_count(dct, n):
data = dct.most_common()
val = data[n-1][1] #get the value of n-1th item
#Now collect all items whose value is greater than or equal to `val`.
return list(takewhile(lambda x: x[1] >= val, data))
test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
print get_items_upto_count(test, 2)
#[('A', 3), ('C', 2), ('B', 2), ('D', 2)]
答案 1 :(得分:0)
对于较小的集合,只需编写一个简单的生成器:
>>> test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
>>> g=(e for e in test.most_common() if e[1]>=2)
>>> list(g)
[('A', 3), ('D', 2), ('C', 2), ('B', 2)]
对于较大的集合,请使用ifilter(或在Python 3上使用filter
):
>>> list(ifilter(lambda t: t[1]>=2, test.most_common()))
[('A', 3), ('C', 2), ('B', 2), ('D', 2)]
或者,由于most_common
已经订购,只需使用for循环并在生成器中打破所需条件:
def fc(d, f):
for t in d.most_common():
if not f(t[1]):
break
yield t
>>> list(fc(test, lambda e: e>=2))
[('A', 3), ('B', 2), ('C', 2), ('D', 2)]