Question

正如标题所说，我需要编写一个代码，该代码返回频率最高的5个单词（来自输入字符串）的列表。这就是我到目前为止所做的：

import collections

def top5_words(text):

  counts =  collections.Counter(text.split())

  return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]

例如，如果您输入：top5_words（＆＃34;一个是赛马，两个也是一个＆＃34;）它应该返回：[＆＃34;一个＆＃34;，＆＃34;两个＆＃34 ;，＆＃34;＆＃34;，＆＃34; a＆＃34;，＆＃34;赛马＆＃34;]但它返回：[＆＃39; one＆＃39;，＆＃39;＆＃39;＆＃39; 39;，＆＃39;两个＆＃39;，＆＃39;赛马＆＃39;，＆＃39;以及＆＃39;，＆＃39; a＆＃39;] - 有谁知道这是为什么？

编辑：

感谢Anand S Kumar，这就是我现在所拥有的：

Model                    Hamming_Loss    Exact_match    Jaccard    One_Error    Rank_Loss
Binary.Relevance           0.94             0.95          0.03      0.04       0.002
Classifier.Chains          0.91             0.94          0.06      0.04       0.03
Random.k-Labelsets         0.95             0.97          0.01      0.01       0.005
...                        ...              ...           ...       ...
...                        ...              ...           ...       ...

Answer 1

您应该使用collections.Counter，然后您可以使用其方法 - most_common()。示例 -

import collections
def top5_words(text):
    counts = collections.Counter(text.split())
    return counts.most_common(5)

请注意，上面返回一个包含5个元组的列表，在每个元组中，第一个元素是实际的单词，第二个元素是该单词的计数。

演示 -

>>> import collections
>>> def top5_words(text):
...     counts = collections.Counter(text.split())
...     return counts.most_common(5)
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
[('that', 2), ('a', 2), ('I', 2), ('the', 2), ('have', 2)]

如果您只想要元素而不是计数，那么您也可以使用列表理解来获取该信息。示例 -

import collections
def top5_words(text):
    counts = collections.Counter(text.split())
    return [elem for elem, _ in counts.most_common(5)]

演示 -

>>> import collections
>>> def top5_words(text):
...     counts = collections.Counter(text.split())
...     return [elem for elem, _ in counts.most_common(5)]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['that', 'a', 'I', 'the', 'have']

来自评论的新要求 -

对于具有相同频率的单词，似乎还有一个问题，我如何按字母顺序对相同的频率单词进行排序？

您可以先获取所有单词及其计数的列表，然后使用sorted对计数进行排序后的首先排序，然后对单元本身进行排序（因此，当计数相同时，按字典顺序排序）。示例 -

import collections
def top5_words(text):
    counts = collections.Counter(text.lower().split())
    return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]

演示 -

>>> import collections
>>> def top5_words(text):
...     counts = collections.Counter(text.lower().split())
...     return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['a', 'have', 'i', 'that', 'the']

Python - 返回频率最高的前5个单词

1 个答案: