在Python列表中找到最常见元素的有效方法是什么?
我的列表项可能无法播放,因此无法使用字典。 同样在绘制的情况下,应返回具有最低索引的项目。例如:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
答案 0 :(得分:402)
更简单的单行:
def most_common(lst):
return max(set(lst), key=lst.count)
答案 1 :(得分:158)
借用here,这可以与Python 2.7一起使用:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
比Alex的解决方案快4-6倍,比newacct提出的单线程快50倍。
如果是tie,则检索列表中首先出现的元素:
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
答案 2 :(得分:88)
有了这么多解决方案,我很惊讶没有人提出我认为是明显的解决方案(对于不可拆解但可比较的元素) - [itertools.groupby
] [1]。 itertools
提供快速,可重用的功能,并允许您将一些棘手的逻辑委托给经过充分测试的标准库组件。考虑例如:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
当然,这可以写得更简洁,但我的目标是最大限度地提高清晰度。可以取消注释两个print
语句,以便更好地了解机制的运行情况;例如, with 打印未注释:
print most_common(['goose', 'duck', 'duck', 'goose'])
发射:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
如您所见,SL
是一对配对列表,每一对都是一个项目,后跟原始列表中的项目索引(以实现关键条件,如果“最常见”项目具有相同的最高count是> 1,结果必须是最早出现的结果。
groupby
仅按项目分组(通过operator.itemgetter
)。在max
计算期间每个分组调用一次的辅助函数接收并在内部解包一个组 - 一个包含两个(item, iterable)
项的元组,其中iterable的项目也是两项元组,(item, original index)
[[SL
]的项目。
然后辅助函数使用循环来确定组的可迭代条目的数量,和最小的原始索引;它返回那些组合的“质量密钥”,最小索引符号已更改,因此max
操作将考虑“更好”那些在原始列表中较早发生的项目。
如果它在时间和空间上担心很少少关于大O问题,那么这个代码可能会简单得多,例如......:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
同样基本的想法,只是简单而紧凑地表达......但是,唉,额外的O(N)辅助空间(将群体的迭代体现为列表)和O(N平方)时间(以获得{ {1}}每个项目)。虽然过早的优化是编程中所有邪恶的根源,但是当O(N log N)可用时故意选择O(N平方)方法对于可扩展性的粒度而言太过分了! - )
最后,对于那些喜欢“oneliners”以获得清晰度和表现的人来说,这是一个带有适当错误名称的奖励1-liner版本: - )。
L.index
答案 3 :(得分:45)
你想要的东西在统计学中被称为模式,而Python当然有一个内置的功能来完全适合你:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
请注意,如果没有“最常见的元素”,例如排在前两位的情况,这会引发StatisticsError
,因为从统计上讲,没有在这种情况下模式。
答案 4 :(得分:9)
如果它们不可清洗,您可以对它们进行排序,并对计算项目的结果进行一次循环(相同的项目将彼此相邻)。但要使它们可以使用并使用字典可能会更快。
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
答案 5 :(得分:6)
这是O(n)解决方案。
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(反向用于确保它返回最低的索引项)
答案 6 :(得分:5)
对列表的副本进行排序,找到最长的运行时间。您可以在使用每个元素的索引对其进行排序之前对列表进行装饰,然后在平局的情况下选择以最低索引开头的运行。
答案 7 :(得分:4)
单行:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
答案 8 :(得分:3)
你可能不再需要这个了,但这就是我为类似的问题所做的。 (由于评论,它看起来比它长。)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
答案 9 :(得分:3)
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
答案 10 :(得分:2)
我正在使用scipy stat模块和lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
结果:
most_freq_val = 5
答案 11 :(得分:2)
简单的一线解决方案
moc= max([(lst.count(chr),chr) for chr in set(lst)])
它将返回频率最高的元素。
答案 12 :(得分:2)
建立在Luiz's answer上,但满足“如果绘制出具有最低索引的项目应返回 ”的条件:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
示例:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
答案 13 :(得分:1)
嗨,这是一个非常简单的大O(n)
解决方案L = [1, 4, 7, 5, 5, 4, 5]
def mode_f(L):
# your code here
counter = 0
number = L[0]
for i in L:
amount_times = L.count(i)
if amount_times > counter:
counter = amount_times
number = i
return number
编号列表中大部分时间重复的元素
答案 14 :(得分:0)
我需要在最近的一个程序中这样做。我承认,我无法理解亚历克斯的答案,所以这就是我最终的结果。
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
我对Alex的解决方案进行了时间考虑,对于短名单而言,它的速度提高了10-15%,但是一旦你超过100个元素或者更多(测试到200000个),它就是关于慢了20%。
答案 15 :(得分:0)
如果排序和散列都不可行,这是一个明显缓慢的解决方案(O(n ^ 2)),但可以进行相等比较(==
):
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
但是,如果列表(n)的长度很大,那么使您的项目可清洗或可排序(按照其他答案的建议)几乎总是能够更快地找到最常见的元素。 O(n)平均有散列,O(n * log(n))最差,用于分类。
答案 16 :(得分:0)
下面:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
我有一种模糊的感觉,标准库中有一个方法会给你每个元素的计数,但我找不到它。
答案 17 :(得分:0)
对于最低索引没有要求,您可以为此使用collections.Counter
:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
答案 18 :(得分:0)
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
答案 19 :(得分:0)
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
答案 20 :(得分:0)
最常见的元素应该是在数组中出现超过 <div class="row">
<div class="col-lg-4 col-md-6 mb-4">
<div class="card h-100">
<a href="#"><img class="card-img-top" src="http://placehold.it/700x400" alt=""></a>
<div class="card-body">
<h4 class="card-title">
<a href="">{{ $product->name }}</a>
</h4>
<h5>{{ $product->price }}</h5>
<p class="card-text">{{ $product->description }}</p>
<td><a class="btn btn-primary" href="{{ route('product.show', $product->id ) }}">Show</a></td>
</div>
<div class="card-footer">
<small class="text-muted">★ ★ ★ ★ ☆</small>
</div>
</div>
</div>
</div>
次的元素,其中 N/2
是 N
。下面的技术将在 len(array)
时间复杂度内完成,只消耗 O(n)
辅助空间。
O(1)
答案 21 :(得分:0)
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
答案 22 :(得分:-2)
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement([“ a”,“ b”,“ a”,“ c”])->“ a”
答案 23 :(得分:-3)
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
答案 24 :(得分:-4)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)