查找列表模态值的最有效方法是什么?

时间:2012-06-21 21:29:02

标签: python algorithm

能够使用双模列表等。

到目前为止我的尝试:

testlist = [1,2,3,3,2,1,4,2,2,3,4,3,3,4,5,3,2,4,55,6,7,4,3,45,543,4,53,4,53,234]

from collections import Counter

def modal_1(xs):
    cntr = Counter(xs).most_common()
    val,count = cntr[0]
    return (v for v,c in cntr if c is count)

print(list(modal_1(testlist)))
>>> [3, 4]

- 或者类似的东西 -

def modal_2(xs):
       cntr = Counter(xs).most_common()
       val,count = cntr[0]
       return takewhile(lambda x: x[1] is count, cntr)

print(list(modal_2(testlist)))
>>> [(3, 7), (4, 7)]

请不要回答 - 使用numpy等。

注意:

Counter(xs).most_common(1)

返回n个模态值的第一个“模态”。如果有两个。它只会返回第一个。这是一个耻辱...因为这会使这更容易。


好吧,所以我真的很惊讶我的一个原始选项实际上是一个很好的方法。对于现在想要在列表中找到n个模态数字的人,我建议以下选项。这两个函数都适用于具有超过1000个值的列表

所有这些返回列表(number,count),其中count对于所有元组都是相同的。我认为最好有这个然后解析它的心愿。

使用takewhile:

from collections import Counter
from itertools import takewhile

def modal_3(xs):
    counter = Counter(xs).most_common()
    mx = counter[0][1]
    return takewhile(lambda x: x[1] == mx, counter)

print(list(modal_3(testlist)))
>>> [(3, 7), (4, 7)]

使用groupby:

from collections import Counter
from itertools import groupby
from operator import itemgetter

def modal_4(xs):    
    container = Counter(xs)
    return next(groupby(container.most_common(), key=itemgetter(1)))[1]

print(list(modal_4(testlist)))
>>> [(3, 7), (4, 7)]

以及最终,pythonic和最快的方式:

def modal_5(xs):

    def _mode(xs):
        for x in xs:
            if x[1] != xs[0][1]:
                break
            yield x

    counter = collections.Counter(xs).most_common()

    return [ x for x in _mode(counter) ]

感谢大家的帮助和信息。

3 个答案:

答案 0 :(得分:3)

我认为你的第二个例子是最好的,稍作修改:

from itertools import takewhile
from collections import Counter

def modal(xs):
       counter = Counter(xs).most_common()
       _, count = counter[0]
       return takewhile(lambda x: x[1] == count, counter)

此处的更改是使用==而不是is - is检查身份,而对于某些值则为true,因为Python会对int提供一些魔力缓存它们的背景,在所有时间都不会成立,在这种情况下不应该依赖它。

>>> a = 1
>>> a is 1
True
>>> a = 300
>>> a is 300
False

答案 1 :(得分:2)

>>> testlist = [1,2,3,3,2,1,4,2,2,3,4,3,3,4,5,3,2,4,55,6,7,4,3,45,543,4,53,4,53,234]
>>> dic={x:testlist.count(x) for x in set(testlist)}

>>> [x for x in dic if dic[x]==max(dic.values())]

[3, 4]

答案 2 :(得分:2)

什么? takewhile但没有groupby

>>> from collections import Counter
>>> testlist = [1,2,3,3,2,1,4,2,2,3,4,3,3,4,5,3,2,4,55,6,7,4,3,45,543,4,53,4,53,234]
>>> cntr = Counter(testlist)
>>> from itertools import groupby
>>> list(x[0] for x in next(groupby(cntr.most_common(), key=lambda x:x[1]))[1])
[3, 4]