给定一个项目,我怎样才能在Python的列表中计算它的出现次数?
答案 0 :(得分:1628)
如果您只想要一个项目的计数,请使用count
方法:
>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3
如果您想要计算多个项目,请使用此项。在循环中调用count
需要在每个count
调用的列表上单独传递,这对性能来说可能是灾难性的。如果您想要计算所有项目,甚至只计算多个项目,请使用Counter
,如其他答案中所述。
答案 1 :(得分:1529)
如果您使用的是Python 2.7或3,并且您希望每个元素的出现次数:
>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})
答案 2 :(得分:220)
计算列表中某个项目的出现次数
为了计算一个列表项的出现次数,您可以使用count()
>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2
计算列表中所有项的出现次数也称为“统计”列表或创建计数器。
使用count()
计算所有项目要计算l
中项目的出现次数,可以简单地使用列表推导和count()
方法
[[x,l.count(x)] for x in set(l)]
(或类似于字典dict((x,l.count(x)) for x in set(l))
)
示例:
>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}
使用Counter()
计算所有项目或者,Counter
库中的collections
类更快
Counter(l)
示例:
>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})
Counter的速度有多快?
我查看了计算列表的速度Counter
的速度。我尝试使用n
的一些值来尝试这两种方法,并且Counter
似乎比常数因子约为2更快。
这是我使用的脚本:
from __future__ import print_function
import timeit
t1=timeit.Timer('Counter(l)', \
'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
)
t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
)
print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count(): ", t2.repeat(repeat=3,number=10000)
输出:
Counter(): [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count(): [7.779430688009597, 7.962715800967999, 8.420845870045014]
答案 3 :(得分:60)
在字典中获取每个项目出现次数的另一种方法:
dict((i, a.count(i)) for i in a)
答案 4 :(得分:43)
list.count(x)
返回x
在列表中显示的次数
请参阅: http://docs.python.org/tutorial/datastructures.html#more-on-lists
答案 5 :(得分:32)
给定一个项目,如何在Python中的列表中计算它的出现次数?
以下是一个示例列表:
>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']
list.count
有list.count
方法
>>> l.count('b')
4
这适用于任何列表。元组也有这种方法:
>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6
collections.Counter
然后是那里的收藏品.Counter。您可以将任何iterable转储到Counter中,而不仅仅是列表,Counter将保留元素计数的数据结构。
用法:
>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4
计数器基于Python词典,它们的键是元素,因此键需要是可清除的。它们基本上就像允许冗余元素进入它们的集合。
collections.Counter
您可以在计数器中添加或减去迭代:
>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4
您也可以使用计数器进行多组操作:
>>> c2 = Counter(list('aabbxyz'))
>>> c - c2 # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2 # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2 # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2 # set intersection
Counter({'a': 2, 'b': 2})
另一个答案表明:
为什么不使用熊猫?
Pandas是一个常见的库,但它不在标准库中。将其添加为一项要求并非易事。
列表对象本身以及标准库中都有针对此用例的内置解决方案。
如果您的项目不需要大熊猫,那么仅仅为此功能提供它是愚蠢的。
答案 6 :(得分:30)
我已将所有建议的解决方案(以及一些新解决方案)与perfplot(我的一个小项目)进行了比较。
对于足够大的数组,结果是
numpy.sum(numpy.array(a) == 1)
比其他解决方案略快。
numpy.bincount(a)
是你想要的。
重现图表的代码:
from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot
def counter(a):
return Counter(a)
def count(a):
return dict((i, a.count(i)) for i in set(a))
def bincount(a):
return numpy.bincount(a)
def pandas_value_counts(a):
return pandas.Series(a).value_counts()
def occur_dict(a):
d = {}
for i in a:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
def count_unsorted_list_items(items):
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
def operator_countof(a):
return dict((i, operator.countOf(a, i)) for i in set(a))
perfplot.show(
setup=lambda n: list(numpy.random.randint(0, 100, n)),
n_range=[2**k for k in range(20)],
kernels=[
counter, count, bincount, pandas_value_counts, occur_dict,
count_unsorted_list_items, operator_countof
],
equality_check=None,
logx=True,
logy=True,
)
2
from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot
def counter(a):
return Counter(a)
def count(a):
return dict((i, a.count(i)) for i in set(a))
def bincount(a):
return numpy.bincount(a)
def pandas_value_counts(a):
return pandas.Series(a).value_counts()
def occur_dict(a):
d = {}
for i in a:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
def count_unsorted_list_items(items):
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
def operator_countof(a):
return dict((i, operator.countOf(a, i)) for i in set(a))
perfplot.show(
setup=lambda n: list(numpy.random.randint(0, 100, n)),
n_range=[2**k for k in range(20)],
kernels=[
counter, count, bincount, pandas_value_counts, occur_dict,
count_unsorted_list_items, operator_countof
],
equality_check=None,
logx=True,
logy=True,
)
答案 7 :(得分:30)
如果您想一次计算所有值,您可以使用numpy数组和bincount
快速完成,如下所示
import numpy as np
a = np.array([1, 2, 3, 4, 1, 4, 1])
np.bincount(a)
给出了
>>> array([0, 3, 1, 1, 2])
答案 8 :(得分:18)
如果您可以使用pandas
,则可以value_counts
进行救援。
>>> import pandas as pd
>>> a = [1, 2, 3, 4, 1, 4, 1]
>>> pd.Series(a).value_counts()
1 3
4 2
3 1
2 1
dtype: int64
它还会根据频率自动对结果进行排序。
如果您希望结果位于列表列表中,请执行以下操作
>>> pd.Series(a).value_counts().reset_index().values.tolist()
[[1, 3], [4, 2], [3, 1], [2, 1]]
答案 9 :(得分:14)
为什么不使用Pandas?
import pandas as pd
l = ['a', 'b', 'c', 'd', 'a', 'd', 'a']
# converting the list to a Series and counting the values
my_count = pd.Series(l).value_counts()
my_count
输出:
a 3
d 2
b 1
c 1
dtype: int64
如果您要查找特定元素的计数,请说 a ,请尝试:
my_count['a']
输出:
3
答案 10 :(得分:12)
我今天遇到了这个问题,并在我考虑检查之前推出了自己的解决方案。这样:
dict((i,a.count(i)) for i in a)
对于大型列表来说真的非常慢。我的解决方案
def occurDict(items):
d = {}
for i in items:
if i in d:
d[i] = d[i]+1
else:
d[i] = 1
return d
实际上比Counter解决方案快一点,至少对于Python 2.7来说。
答案 11 :(得分:12)
# Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
from collections import defaultdict
def count_unsorted_list_items(items):
"""
:param items: iterable of hashable items to count
:type items: iterable
:returns: dict of counts like Py2.7 Counter
:rtype: dict
"""
counts = defaultdict(int)
for item in items:
counts[item] += 1
return dict(counts)
# Python >= 2.2 (generators)
def count_sorted_list_items(items):
"""
:param items: sorted iterable of items to count
:type items: sorted iterable
:returns: generator of (item, count) tuples
:rtype: generator
"""
if not items:
return
elif len(items) == 1:
yield (items[0], 1)
return
prev_item = items[0]
count = 1
for item in items[1:]:
if prev_item == item:
count += 1
else:
yield (prev_item, count)
count = 1
prev_item = item
yield (item, count)
return
import unittest
class TestListCounters(unittest.TestCase):
def test_count_unsorted_list_items(self):
D = (
([], []),
([2], [(2,1)]),
([2,2], [(2,2)]),
([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
)
for inp, exp_outp in D:
counts = count_unsorted_list_items(inp)
print inp, exp_outp, counts
self.assertEqual(counts, dict( exp_outp ))
inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )
def test_count_sorted_list_items(self):
D = (
([], []),
([2], [(2,1)]),
([2,2], [(2,2)]),
([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
)
for inp, exp_outp in D:
counts = list( count_sorted_list_items(inp) )
print inp, exp_outp, counts
self.assertEqual(counts, exp_outp)
inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
# ... [(2,2), (4,1), (2,1)]
答案 12 :(得分:6)
计算具有共同类型的不同元素的数量:
li = ['A0','c5','A8','A2','A5','c2','A3','A9']
print sum(1 for el in li if el[0]=='A' and el[1] in '01234')
给出
3
,而不是6
答案 13 :(得分:4)
from collections import Counter
country=['Uruguay', 'Mexico', 'Uruguay', 'France', 'Mexico']
count_country = Counter(country)
output_list= []
for i in count_country:
output_list.append([i,count_country[i]])
print output_list
输出列表:
[['Mexico', 2], ['France', 1], ['Uruguay', 2]]
答案 14 :(得分:4)
建议使用numpy的bincount,但是它仅适用于具有非负整数的一维数组。另外,结果数组可能令人困惑(它包含原始列表的从最小值到最大值的整数的出现,并将丢失的整数设置为0)。
使用numpy更好的方法是使用unique函数并将属性return_counts
设置为True。它会返回一个元组,其中包含唯一值数组和每个唯一值的出现数组。
# a = [1, 1, 0, 2, 1, 0, 3, 3]
a_uniq, counts = np.unique(a, return_counts=True) # array([0, 1, 2, 3]), array([2, 3, 1, 2]
然后我们可以将它们配对为
dict(zip(a_uniq, counts)) # {0: 2, 1: 3, 2: 1, 3: 2}
它还适用于其他数据类型和“ 2d列表”,例如
>>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
>>> dict(zip(*np.unique(a, return_counts=True)))
{'a': 3, 'b': 3, 'c': 2}
答案 15 :(得分:3)
答案 16 :(得分:3)
Dim lRowCount As Long
Dim temp As String, s As String
Dim arrLength As Long
Dim hasElement As Boolean
Dim plans() As String, currentPlan() As String
Dim locationCount As Long
Dim currentRoutes As String
Dim line As Long
Worksheets("Sheet1").Activate
Application.ActiveSheet.UsedRange
lRowCount = ActiveSheet.UsedRange.Rows.Count
locationCount = -1
line = 2
Debug.Print ("*********")
For K = 1 To lRowCount - 1
currentRoutes = ""
For i = K To lRowCount
s = ActiveSheet.Cells(i, 1)
Do
temp = s
s = Replace(s, " ", "")
s = Replace(s, "|", "")
s = Replace(s, ",", "")
Loop Until temp = s
If i = K Then
currentRoutes = ActiveSheet.Cells(i, 1)
elements = s
Else
hasElement = False
For j = 1 To Len(s)
If InStr(elements, Mid(s, j, 1)) > 0 Then hasElement = True: Exit For
Next j
If Not hasElement Then
elements = elements & s
currentRoutes = currentRoutes & " " & ActiveSheet.Cells(i, 1)
End If
End If
Next i
Debug.Print (Trim(currentRoutes))
Worksheets("Sheet1").Cells(line, 11) = currentRoutes
line = line + 1
Erase plans
Debug.Print ("*********")
Next K
的元素的计数获取列表中所有元素的数量的另一种可能是借助itertools.groupby()
。
计数为“重复”
itertools.groupby()
返回
from itertools import groupby
L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c'] # Input list
counts = [(i, len(list(c))) for i,c in groupby(L)] # Create value-count pairs as list of tuples
print(counts)
请注意,它是如何将前三个[('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]
组合为第一组的,而其他a
组则位于列表的下方。发生这种情况是因为未对输入列表a
进行排序。如果实际上应该将组分开,那么有时这可能是一个好处。
具有唯一计数
如果需要唯一的组计数,只需对输入列表进行排序:
L
返回
counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
print(counts)
注意:与[('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]
解决方案相比,其他许多答案为了创建唯一计数,它们提供了更容易阅读的代码。但是这里显示它与重复计数示例相似。
答案 17 :(得分:3)
最快的方法是使用for循环并将其存储在Dict中。
import time
from collections import Counter
def countElement(a):
g = {}
for i in a:
if i in g:
g[i] +=1
else:
g[i] =1
return g
z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4]
#Solution 1 - Faster
st = time.monotonic()
for i in range(1000000):
b = countElement(z)
et = time.monotonic()
print(b)
print('Simple for loop and storing it in dict - Duration: {}'.format(et - st))
#Solution 2 - Fast
st = time.monotonic()
for i in range(1000000):
a = Counter(z)
et = time.monotonic()
print (a)
print('Using collections.Counter - Duration: {}'.format(et - st))
#Solution 3 - Slow
st = time.monotonic()
for i in range(1000000):
g = dict([(i, z.count(i)) for i in set(z)])
et = time.monotonic()
print(g)
print('Using list comprehension - Duration: {}'.format(et - st))
结果
#Solution 1 - Faster
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3}
Simple for loop and storing it in dict - Duration: 12.032000000000153
#Solution 2 - Fast
Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1})
Using collections.Counter - Duration: 15.889999999999418
#Solution 3 - Slow
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1}
Using list comprehension - Duration: 33.0
答案 18 :(得分:3)
给出列表X
import numpy as np
X = [1, -1, 1, -1, 1]
显示i:此列表元素的频率(i)的字典是:
{i:X.count(i) for i in np.unique(X)}
输出:
{-1: 2, 1: 3}
答案 19 :(得分:2)
可能不是最有效的,需要额外的传递来删除重复项。
功能实施:
arr = np.array(['a','a','b','b','b','c'])
print(set(map(lambda x : (x , list(arr).count(x)) , arr)))
返回:
{('c', 1), ('b', 3), ('a', 2)}
或返回dict
:
print(dict(map(lambda x : (x , list(arr).count(x)) , arr)))
返回:
{'b': 3, 'c': 1, 'a': 2}
答案 20 :(得分:2)
使用 %timeit 来查看哪个操作更有效。 np.array 计数操作应该更快。
from collections import Counter
mylist = [1,7,7,7,3,9,9,9,7,9,10,0]
types_counts=Counter(mylist)
print(types_counts)
答案 21 :(得分:2)
我会使用filter()
,以卢卡斯(Lukasz)为例:
>>> lst = [1, 2, 3, 4, 1, 4, 1]
>>> len(filter(lambda x: x==1, lst))
3
答案 22 :(得分:1)
sum([1 for elem in <yourlist> if elem==<your_value>])
这将返回your_value
的出现次数答案 23 :(得分:0)
或者,您也可以自己实现计数器。我的做法是这样的:
item_list = ['me', 'me', 'you', 'you', 'you', 'they']
occ_dict = {}
for item in item_list:
if item not in occ_dict:
occ_dict[item] = 1
else:
occ_dict[item] +=1
print(occ_dict)
输出:{'me': 2, 'you': 3, 'they': 1}
答案 24 :(得分:0)
l2=[1,"feto",["feto",1,["feto"]],['feto',[1,2,3,['feto']]]]
count=0
def Test(l):
global count
if len(l)==0:
return count
count=l.count("feto")
for i in l:
if type(i) is list:
count+=Test(i)
return count
print(Test(l2))
这将递归计数或搜索列表中的项目,即使它在列表列表中也是如此
答案 25 :(得分:0)
尽管这是一个非常古老的问题,但是由于我没有找到一支,所以我做了一支。
# original numbers in list
l = [1, 2, 2, 3, 3, 3, 4]
# empty dictionary to hold pair of number and its count
d = {}
# loop through all elements and store count
[ d.update( {i:d.get(i, 0)+1} ) for i in l ]
print(d)
答案 26 :(得分:0)
alarm_last
答案 27 :(得分:0)
如果您希望特定元素多次出现:
>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> single_occurrences = Counter(z)
>>> print(single_occurrences.get("blue"))
3
>>> print(single_occurrences.values())
dict_values([3, 2, 1])
答案 28 :(得分:-1)
test = [409.1, 479.0, 340.0, 282.4, 406.0, 300.0, 374.0, 253.3, 195.1, 269.0, 329.3, 250.7, 250.7, 345.3, 379.3, 275.0, 215.2, 300.0]
for i in test:
print('{} numbers {}'.format(i, test.count(i)))