查找列表的模式

时间:2012-05-29 11:00:14

标签: python mode

鉴于项目列表,请回想一下列表的模式是最常出现的项目。

我想知道如何创建一个可以找到列表模式但是如果列表没有模式则显示消息的函数(例如,列表中的所有项只出现一次)。我想在不导入任何函数的情况下创建此函数。我正试图从头开始制作自己的功能。

26 个答案:

答案 0 :(得分:126)

您可以使用max功能和密钥。看看python max function using 'key' and lambda expression

max(set(list), key=list.count)

答案 1 :(得分:90)

您可以使用Counter包中提供的具有mode - esque功能的collections

from collections import Counter
data = Counter(your_list_in_here)
data.most_common()   # Returns all unique items and their counts
data.most_common(1)  # Returns the highest occurring item

注意:Counter是python 2.7中的新增功能,在早期版本中不可用。

答案 2 :(得分:48)

Python 3.4包含方法statistics.mode,所以它很简单:

>>> from statistics import mode
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
 3

您可以在列表中包含任何类型的元素,而不仅仅是数字:

>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
 'red'

答案 3 :(得分:23)

从一些统计软件(即SciPyMATLAB中取一个叶子,它们只返回最小的最常见值,因此如果两个值经常出现,则返回最小值。希望一个例子可以帮助:

>>> from scipy.stats import mode

>>> mode([1, 2, 3, 4, 5])
(array([ 1.]), array([ 1.]))

>>> mode([1, 2, 2, 3, 3, 4, 5])
(array([ 2.]), array([ 2.]))

>>> mode([1, 2, 2, -3, -3, 4, 5])
(array([-3.]), array([ 2.]))

你有什么理由不遵守这个惯例吗?

答案 4 :(得分:23)

有很多简单的方法可以在Python中找到列表模式,例如:

import statistics
statistics.mode([1,2,3,3])
>>> 3

或者,您可以通过计数找到最大值

max(array, key = array.count)

这两种方法的问题在于它们无法使用多种模式。第一个返回错误,第二个返回第一个模式。

为了找到集合的模式,您可以使用此功能:

def mode(array):
    most = max(list(map(array.count, array)))
    return list(set(filter(lambda x: array.count(x) == most, array)))

答案 5 :(得分:7)

扩展了在列表为空时无法使用的社区答案,这是模式的有效代码:

def mode(arr):
        if arr==[]:
            return None
        else:
            return max(set(arr), key=arr.count)

答案 6 :(得分:3)

如果您对最小,最大或所有模式感兴趣:

def get_small_mode(numbers, out_mode):
    counts = {k:numbers.count(k) for k in set(numbers)}
    modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
    if out_mode=='smallest':
        return modes[0]
    elif out_mode=='largest':
        return modes[-1]
    else:
        return modes

答案 7 :(得分:2)

简短,但有些丑陋:

def mode(arr) :
    m = max([arr.count(a) for a in arr])
    return [x for x in arr if arr.count(x) == m][0] if m>1 else None

使用字典,稍微不那么难看:

def mode(arr) :
    f = {}
    for a in arr : f[a] = f.get(a,0)+1
    m = max(f.values())
    t = [(x,f[x]) for x in f if f[x]==m]
    return m > 1 t[0][0] else None

答案 8 :(得分:2)

稍长一点,但可以有多种模式,并且可以获得大多数计数或混合数据类型的字符串。

def getmode(inplist):
    '''with list of items as input, returns mode
    '''
    dictofcounts = {}
    listofcounts = []
    for i in inplist:
        countofi = inplist.count(i) # count items for each item in list
        listofcounts.append(countofi) # add counts to list
        dictofcounts[i]=countofi # add counts and item in dict to get later
    maxcount = max(listofcounts) # get max count of items
    if maxcount ==1:
        print "There is no mode for this dataset, values occur only once"
    else:
        modelist = [] # if more than one mode, add to list to print out
        for key, item in dictofcounts.iteritems():
            if item ==maxcount: # get item from original list with most counts
                modelist.append(str(key))
        print "The mode(s) are:",' and '.join(modelist)
        return modelist 

答案 9 :(得分:2)

我写了这个方便的功能来找到模式。

def mode(nums):
    corresponding={}
    occurances=[]
    for i in nums:
            count = nums.count(i)
            corresponding.update({i:count})

    for i in corresponding:
            freq=corresponding[i]
            occurances.append(freq)

    maxFreq=max(occurances)

    keys=corresponding.keys()
    values=corresponding.values()

    index_v = values.index(maxFreq)
    global mode
    mode = keys[index_v]
    return mode

答案 10 :(得分:1)

对于一个mode的数字,它必须比列表中的至少一个其他数字出现的次数更多,且必须是列表中唯一的数字。所以,我重构@ mathwizurd的答案(使用difference方法)如下:

def mode(array):
    '''
    returns a set containing valid modes
    returns a message if no valid mode exists
      - when all numbers occur the same number of times
      - when only one number occurs in the list 
      - when no number occurs in the list 
    '''
    most = max(map(array.count, array)) if array else None
    mset = set(filter(lambda x: array.count(x) == most, array))
    return mset if set(array) - mset else "list does not have a mode!" 

这些测试成功通过:

mode([]) == None 
mode([1]) == None
mode([1, 1]) == None 
mode([1, 1, 2, 2]) == None 

答案 11 :(得分:1)

好吧!所以社区已经有很多答案,其中一些使用了另一个功能,而您不想要。
让我们创建非常简单易懂的函数。

import numpy as np

#Declare Function Name
def calculate_mode(lst):
<块引用>

下一步是在列表中找到唯一元素和它们各自的频率

unique_elements,freq = np.unique(lst, return_counts=True)
<块引用>

获取模式

max_freq = np.max(freq)   #maximum frequency
mode_index = np.where(freq==max_freq)  #max freq index
mode = unique_elements[mode_index]   #get mode by index
return mode
<块引用>

示例

lst =np.array([1,1,2,3,4,4,4,5,6])
print(calculate_mode(lst))
>>> Output [4]

答案 12 :(得分:1)

简单代码,无需输入即可查找列表模式:

nums = #your_list_goes_here
nums.sort()
counts = dict()
for i in nums:
    counts[i] = counts.get(i, 0) + 1
mode = max(counts, key=counts.get)

在多种模式下,它应该返回最小节点。

答案 13 :(得分:1)

数据集的模式是数据集中出现频率最高的成员。如果有两个成员最常出现且次数相同,则数据具有两种模式。这称为双峰。

如果有两种以上的模,则数据将称为多峰。如果数据集中的所有成员都出现相同的次数,则数据集中根本没有模式。

以下功能可以在模式中找到模式。给定的数据列表:

import numpy as np; import pandas as pd

def modes(arr):
    df = pd.DataFrame(arr, columns=['Values'])
    dat = pd.crosstab(df['Values'], columns=['Freq'])
    if len(np.unique((dat['Freq']))) > 1:
        mode = list(dat.index[np.array(dat['Freq'] == max(dat['Freq']))])
        return mode
    else:
        print("There is NO mode in the data set")

输出:

# For a list of numbers in x as
In [1]: x = [2, 3, 4, 5, 7, 9, 8, 12, 2, 1, 1, 1, 3, 3, 2, 6, 12, 3, 7, 8, 9, 7, 12, 10, 10, 11, 12, 2]
In [2]: modes(x)
Out[2]: [2, 3, 12]
# For a list of repeated numbers in y as
In [3]: y = [2, 2, 3, 3, 4, 4, 10, 10]
In [4]: modes(y)
Out[4]: There is NO mode in the data set
# For a list of strings/characters in z as
In [5]: z = ['a', 'b', 'b', 'b', 'e', 'e', 'e', 'd', 'g', 'g', 'c', 'g', 'g', 'a', 'a', 'c', 'a']
In [6]: modes(z)
Out[6]: ['a', 'g']

如果我们不想导入numpypandas从这些包中调用任何函数,那么为了获得相同的输出,modes()函数可以写为:

def modes(arr):
    cnt = []
    for i in arr:
        cnt.append(arr.count(i))
    uniq_cnt = []
    for i in cnt:
        if i not in uniq_cnt:
            uniq_cnt.append(i)
    if len(uniq_cnt) > 1:
        m = []
        for i in list(range(len(cnt))):
            if cnt[i] == max(uniq_cnt):
                m.append(arr[i])
        mode = []
        for i in m:
            if i not in mode:
                mode.append(i)
        return mode
    else:
        print("There is NO mode in the data set")

答案 14 :(得分:1)

在这里,您可以找到列表的均值,中位数和众数:

import numpy as np
from scipy import stats

#to take input
size = int(input())
numbers = list(map(int, input().split()))

print(np.mean(numbers))
print(np.median(numbers))
print(int(stats.mode(numbers)[0]))

答案 15 :(得分:1)

为什么不

def print_mode (thelist):
  counts = {}
  for item in thelist:
    counts [item] = counts.get (item, 0) + 1
  maxcount = 0
  maxitem = None
  for k, v in counts.items ():
    if v > maxcount:
      maxitem = k
      maxcount = v
  if maxcount == 1:
    print "All values only appear once"
  elif counts.values().count (maxcount) > 1:
    print "List has multiple modes"
  else:
    print "Mode of list:", maxitem

它没有一些应该有的错误检查,但它会找到模式而不导入任何函数,如果所有值只出现一次,它将打印一条消息。它还会检测共享相同最大计数的多个项目,但不清楚是否需要它。

答案 16 :(得分:1)

此函数返回函数的模式,无论数据集中的模式或模式的频率有多少。如果没有模式(即所有项目仅出现一次),则该函数返回错误字符串。这类似于上面的A_nagpal函数,但在我看来,它更完整,我认为对于任何Python新手(比如你的)来说,阅读这个问题要理解它会更容易理解。 / p>

 def l_mode(list_in):
    count_dict = {}
    for e in (list_in):   
        count = list_in.count(e)
        if e not in count_dict.keys():
            count_dict[e] = count
    max_count = 0 
    for key in count_dict: 
        if count_dict[key] >= max_count:
            max_count = count_dict[key]
    corr_keys = [] 
    for corr_key, count_value in count_dict.items():
        if count_dict[corr_key] == max_count:
            corr_keys.append(corr_key)
    if max_count == 1 and len(count_dict) != 1: 
        return 'There is no mode for this data set. All values occur only once.'
    else: 
        corr_keys = sorted(corr_keys)
        return corr_keys, max_count

答案 17 :(得分:0)

#function to find mode
def mode(data):  
    modecnt=0
#for count of number appearing
    for i in range(len(data)):
        icount=data.count(data[i])
#for storing count of each number in list will be stored
        if icount>modecnt:
#the loop activates if current count if greater than the previous count 
            mode=data[i]
#here the mode of number is stored 
            modecnt=icount
#count of the appearance of number is stored
    return mode
print mode(data1)

答案 18 :(得分:0)

如果你想要一个明确的方法,对课堂有用,只能通过理解使用列表和词典,你可以这样做:

setTimeout()

答案 19 :(得分:0)

这将返回所有模式:

def mode(numbers)
    largestCount = 0
    modes = []
    for x in numbers:
        if x in modes:
            continue
        count = numbers.count(x)
        if count > largestCount:
            del modes[:]
            modes.append(x)
            largestCount = count
        elif count == largestCount:
            modes.append(x)
    return modes

答案 20 :(得分:0)

import numpy as np
def get_mode(xs):
    values, counts = np.unique(xs, return_counts=True)
    max_count_index = np.argmax(counts) #return the index with max value counts
    return values[max_count_index]
print(get_mode([1,7,2,5,3,3,8,3,2]))

答案 21 :(得分:0)

对于那些寻求最小模式的人,例如:使用numpy进行双峰分布的情况。

import numpy as np
mode = np.argmax(np.bincount(your_list))

答案 22 :(得分:0)

这是一个简单的函数,它获取列表中出现的第一种模式。它创建一个字典,将列表元素作为键和出现次数,然后读取dict值以获得模式。

'bmp'      Windows® Bitmap (BMP)


1-bit, 8-bit, and 24-bit uncompressed images

'gif'     Graphics Interchange Format (GIF)


8-bit images

'hdf'     Hierarchical Data Format (HDF4)


8-bit raster image data sets with or without associated colormap, 24-bit raster image data sets

'jpg' or 'jpeg'    Joint Photographic Experts Group (JPEG)


8-bit, 12-bit, and 16-bit Baseline JPEG images

    Note:   imwrite converts indexed images to RGB before writing data to JPEG files, because the JPEG format does not support indexed images.

'jp2' or 'jpx'   JPEG 2000 — Joint Photographic Experts Group 2000


1-bit, 8-bit, and 16-bit JPEG 2000 images

'pbm'   Portable Bitmap (PBM)


Any 1-bit PBM image, ASCII (plain) or raw (binary) encoding

'pcx'   Windows Paintbrush (PCX)


8-bit images

'pgm'   Portable Graymap (PGM)


Any standard PGM image; ASCII (plain) encoded with arbitrary color depth; raw (binary) encoded with up to 16 bits per gray value

'png'   Portable Network Graphics (PNG)


1-bit, 2-bit, 4-bit, 8-bit, and 16-bit grayscale images; 8-bit and 16-bit grayscale images with alpha channels; 1-bit, 2-bit, 4-bit, and 8-bit indexed images; 24-bit and 48-bit truecolor images; 24-bit and 48-bit truecolor images with alpha channels

'pnm'   Portable Anymap (PNM)


Any of the PPM/PGM/PBM formats, chosen automatically

'ppm'   Portable Pixmap (PPM)


Any standard PPM image: ASCII (plain) encoded with arbitrary color depth or raw (binary) encoded with up to 16 bits per color component

'ras'   Sun™ Raster (RAS)


Any RAS image, including 1-bit bitmap, 8-bit indexed, 24-bit truecolor, and 32-bit truecolor with alpha

'tif' or 'tiff'   Tagged Image File Format (TIFF)


Baseline TIFF images, including:

    1-bit, 8-bit, 16-bit, 24-bit, and 48-bit uncompressed images and images with packbits, LZW, or Deflate compression

    1-bit images with CCITT 1D, Group 3, and Group 4 compression

    CIELAB, ICCLAB, and CMYK images

'xwd'  X Windows Dump (XWD)


8-bit ZPixmaps

答案 23 :(得分:0)

def mode(data):
    lst =[]
    hgh=0
    for i in range(len(data)):
        lst.append(data.count(data[i]))
    m= max(lst)
    ml = [x for x in data if data.count(x)==m ] #to find most frequent values
    mode = []
    for x in ml: #to remove duplicates of mode
        if x not in mode:
        mode.append(x)
    return mode
print mode([1,2,2,2,2,7,7,5,5,5,5])

答案 24 :(得分:0)

也许可以尝试以下操作。它是 O(n) 并返回浮点数(或整数)列表。它经过彻底的自动测试。它使用 collections.defaultdict,但我想您并不反对使用它。也可以在 https://stromberg.dnsalias.org/~strombrg/stddev.html

找到
def compute_mode(list_: typing.List[float]) -> typing.List[float]:
    """                       
    Compute the mode of list_.

    Note that the return value is a list, because sometimes there is a tie for "most common value".
                                                                        
    See https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
    """                                                                                                        
    if not list_:
        raise ValueError('Empty list')
    if len(list_) == 1:           
        raise ValueError('Single-element list')
    value_to_count_dict: typing.DefaultDict[float, int] = collections.defaultdict(int)
    for element in list_:
        value_to_count_dict[element] += 1
    count_to_values_dict = collections.defaultdict(list)
    for value, count in value_to_count_dict.items():   
        count_to_values_dict[count].append(value)                           
    counts = list(count_to_values_dict)
    if len(counts) == 1:                                                                            
        raise ValueError('All elements in list are the same')          
    maximum_occurrence_count = max(counts)
    if maximum_occurrence_count == 1:
        raise ValueError('No element occurs more than once')
    minimum_occurrence_count = min(counts)
    if maximum_occurrence_count <= minimum_occurrence_count:
        raise ValueError('Maximum count not greater than minimum count')
    return count_to_values_dict[maximum_occurrence_count]

答案 25 :(得分:0)

def mode(inp_list):
    sort_list = sorted(inp_list)
    dict1 = {}
    for i in sort_list:        
            count = sort_list.count(i)
            if i not in dict1.keys():
                dict1[i] = count

    maximum = 0 #no. of occurences
    max_key = -1 #element having the most occurences

    for key in dict1:
        if(dict1[key]>maximum):
            maximum = dict1[key]
            max_key = key 
        elif(dict1[key]==maximum):
            if(key<max_key):
                maximum = dict1[key]
                max_key = key

    return max_key