如何在列表中找到最常见的单词?

时间:2019-09-09 14:06:25

标签: python list

我刚刚开始编码;因此,我没有使用字典或集合,导入或比for / while循环和if语句更高级的东西

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

def codedlist(number):
      max= 0
      for k in hello:
            if first.count(number) > max:
                    max= first.count(number)

9 个答案:

答案 0 :(得分:2)

您可以使用collections.Counter以单线查找它:

from collections import Counter

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
Counter(list1).most_common()[-1]

输出:

('cry', 2)

(most_common()返回按其计数排序的计数元素列表,最后一个元素[-1]是最小计数)

或者,如果您可以包含几个最小元素,则稍微复杂一点:

from collections import Counter

list1 = [1,2,3,4,4,4,4,4]
counted = Counter(list1).most_common()
least_count = min(counted, key=lambda y: y[1])[1]
list(filter(lambda x: x[1] == least_count, counted))

输出:

[(1, 1), (2, 1), (3, 1)]

答案 1 :(得分:1)

您可以使用collections.Counter来计数每个字符串的频率,然后使用min来获得最小频率,然后使用列表推导来获得具有最小频率的字符串:

from collections import Counter

def codedlist(number):
    c = Counter(number)
    m = min(c.values())
    return [s for s, i in c.items() if i == m]

print(codedlist(list1))
print(codedlist(list2))

输出:

['cry']
['cry', 'no', 'me']

答案 2 :(得分:1)

from collections import OrderedDict, Counter

def least_common(words):
    d = dict(Counter(words))
    min_freq = min(d.values())
    return [(k,v) for k,v in d.items() if v == min_freq]

words = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

print(least_common(words))

答案 3 :(得分:1)

一种简单的算法方法:

def codedlist(my_list):
    least = 99999999 # A very high number
    word = ''
    for element in my_list:
        repeated = my_list.count(element)
        if repeated < least:
            least = repeated # This is just a counter
            word = element # This is the word
    return word

它的表现不是很好。有更好的方法可以做到这一点,但我认为对于初学者来说,这是一种简单的理解方式。

答案 4 :(得分:1)

如果您希望所有单词按最小值排序:

import numpy as np

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

uniques_values = np.unique(list1)

final_list = []
for i in range(0,len(uniques_values)):
    final_list.append((uniques_values[i], list1.count(uniques_values[i])))

def takeSecond(elem):
    return elem[1]

final_list.sort(key=takeSecond)

print(final_list)

对于列表1:

  

[('cry',2),('no',3),('me',4)]

对于列表2:

  

[('cry',3),('me',3),('no',3)]

请谨慎使用代码,要更改列表,您必须在两点上编辑代码。

一些有用的解释:

  • numpy.unique为您提供非重复值

  • def takeSecond(elem)返回 elem [1] ,该函数允许您按[1]列(第二个值)对数组进行排序。

显示值或使所有项目按此条件排序可能很有用。

希望有帮助。

答案 5 :(得分:1)

查找最小值通常类似于查找最大值。您计算一个元素的出现次数,如果该计数小于计数器(对于最不常见的元素出现次数):则替换该计数器。

这是一个粗略的解决方案,它占用大量内存,并且需要大量时间才能运行。如果尝试缩短运行时间和内存使用量,您将了解更多列表(及其操作)。我希望这会有所帮助!

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

def codedlist(l):
    min = False #This is out counter
    indices = [] #This records the positions of the counts
    for i in range(0,len(l)):
        count = 0
        for x in l: #You can possibly shorten the run time here
            if(x == l[i]):
                count += 1
        if not min: #Also can be read as: If this is the first element.
            min = count
            indices = [i]
        elif min > count: #If this element is the least common
            min = count #Replace the counter
            indices = [i] # This is your only index
        elif min == count: #If this least common (but there were more element with the same count)
            indices.append(i) #Add it to our indices counter

    tempList = []
    #You can possibly shorten the run time below
    for ind in indices:
        tempList.append(l[ind])
    rList = []
    for x in tempList: #Remove duplicates in the list
        if x not in rList:
            rList.append(x)
    return rList

print(codedlist(list1))
print(codedlist(list2))

输出

['cry']
['cry', 'no', 'me']

答案 6 :(得分:1)

def codedlist(list):
    dict = {}
    for item in list:
        dict[item]=list.count(item)
    most_common_number = max(dict.values())
    most_common = []
    for k,v in dict.items():
        if most_common_number == v:
            most_common.append(k)
    return most_common
list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

print(codedlist(list1))

答案 7 :(得分:1)

可能是最简单,最快的方法来接收集合中最不常见的物品。

min(list1, key=list1.count)

实际情况:

>>> data = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
>>> min(data, key=data.count)
'cry'

测试了速度与collections.Counter方法的比较,它的速度要快得多。参见this REPL

P.S:使用max可以找到最常见的项目。

修改

要获得多个最不常见的项目,您可以使用一种理解方法来扩展此方法。

>>> lc = data.count(min(data, key=data.count))
>>> {i for i in data if data.count(i) == lc}
{'no', 'me', 'cry'}

答案 8 :(得分:1)

基本上,您想浏览一下列表,并在每个元素中问自己:

  

“我以前看过这个元素吗?”

如果答案为是,则将该元素的计数加1;如果答案为否,则将其添加至可见值字典。最后我们按值对它进行排序,然后选择第一个单词,因为它是最小的。让我们实现它:

import operator

words = ['blah','blah','car']
seen_dictionary = {}
for w in words:
    if w in seen_dic.keys():
        seen_dictionary[w] += 1 
    else:
        seen_dic.update({w : 1})

final_word = sorted(x.items(), key=operator.itemgetter(1))[0][0] #as the output will be 2D tuple sorted by the second element in each of smaller tuples.