Question

def maxVote(nLabels):
    count = {}
    maxList = []
    maxCount = 0
    for nLabel in nLabels:
        if nLabel in count:
            count[nLabel] += 1
        else:
            count[nLabel] = 1
    #Check if the count is max
        if count[nLabel] > maxCount:
            maxCount = count[nLabel]
            maxList = [nLabel,]
        elif count[nLabel]==maxCount:
            maxList.append(nLabel)
    return random.choice(maxList)

nLabels包含整数列表。

上述函数返回频率最高的整数，如果多个频率相同，则返回一个随机选择的整数。

E.g。 maxVote([1,3,4,5,5,5,3,12,11])为5

Answer 1

import random
import collections

def maxvote(nlabels):
  cnt = collections.defaultdict(int)
  for i in nlabels:
    cnt[i] += 1
  maxv = max(cnt.itervalues())
  return random.choice([k for k,v in cnt.iteritems() if v == maxv])

print maxvote([1,3,4,5,5,5,3,3,11])

Answer 2

在Python 3.1或将来的2.7中，您可以使用Counter：

>>> from collections import Counter
>>> Counter([1,3,4,5,5,5,3,12,11]).most_common(1)
[(5, 3)]

如果您无法访问这些版本的Python，则可以执行以下操作：

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in nLabels:
    d[i] += 1


>>> max(d, key=lambda x: d[x])
5

Answer 3

似乎在O（n）时间内运行。但是，检查if nLabel in count可能存在瓶颈，因为此操作也可能潜在地运行O（n）时间，从而使总效率为O（n ^ 2）。

在这种情况下使用字典而不是列表是我能发现的唯一主要效率提升。

Answer 4

我不确定你想要优化什么，但这应该有效：

from collections import defaultdict

def maxVote(nLabels):
   count = defaultdict(int)
   for nLabel in nLabels:
      count[nLabel] += 1
   maxCount = max(count.itervalues())
   maxList = [k for k in count if count[k] == maxCount]
   return random.choice(maxList)

Answer 5

想法1

返回确实需要是随机的，还是只能返回 a 最大值？如果您只需要不确定地返回最大频率，您可以只存储一个标签并删除列表逻辑，包括

 elif count[nLabel]==maxCount:
        maxList.append(nLabel)

创意2

如果频繁调用此方法，是否可以仅处理新数据，而不是整个数据集？您可以缓存计数映射，然后只处理新数据。假设您的数据集很大且计算完成在线，这可能会带来巨大的改进。

Answer 6

完整示例：

#!/usr/bin/env python

def max_vote(l):
    """
    Return the element with the (or a) maximum frequency in ``l``.
    """
    unsorted = [(a, l.count(a)) for a in set(l)]
    return sorted(unsorted, key=lambda x: x[1]).pop()[0]

if __name__ == '__main__':
    votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]
    print max_vote(votes)
    # => 5

基准：

#!/usr/bin/env python

import random
import collections

def max_vote_2(l):
    """
    Return the element with the (or a) maximum frequency in ``l``.
    """
    unsorted = [(a, l.count(a)) for a in set(l)]
    return sorted(unsorted, key=lambda x: x[1]).pop()[0]

def max_vote_1(nlabels):
    cnt = collections.defaultdict(int)
    for i in nlabels:
        cnt[i] += 1
        maxv = max(cnt.itervalues())
    return random.choice([k for k,v in cnt.iteritems() if v == maxv])

if __name__ == '__main__':
    from timeit import Timer
    votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]
    print max_vote_1(votes)
    print max_vote_2(votes)

    t = Timer("votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]; max_vote_2(votes)", \
        "from __main__ import max_vote_2")
    print 'max_vote_2', t.timeit(number=100000)

    t = Timer("votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]; max_vote_1(votes)", \
        "from __main__ import max_vote_1")
    print 'max_vote_1', t.timeit(number=100000)

收率：

5
5
max_vote_2 1.79455208778
max_vote_1 2.31705093384

如何优化这个Python代码？

6 个答案:

想法1

创意2