正如标题所说,我需要编写一个代码,该代码返回频率最高的5个单词(来自输入字符串)的列表。这就是我到目前为止所做的:
import collections
def top5_words(text):
counts = collections.Counter(text.split())
return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
例如,如果您输入:top5_words("一个是赛马,两个也是一个")它应该返回:["一个","两个&#34 ;,""," a","赛马"]但它返回:[' one',''' 39;,'两个','赛马','以及',' a'] - 有谁知道这是为什么?
编辑:
感谢Anand S Kumar,这就是我现在所拥有的:
Model Hamming_Loss Exact_match Jaccard One_Error Rank_Loss
Binary.Relevance 0.94 0.95 0.03 0.04 0.002
Classifier.Chains 0.91 0.94 0.06 0.04 0.03
Random.k-Labelsets 0.95 0.97 0.01 0.01 0.005
... ... ... ... ...
... ... ... ... ...
答案 0 :(得分:3)
您应该使用collections.Counter
,然后您可以使用其方法 - most_common()
。示例 -
import collections
def top5_words(text):
counts = collections.Counter(text.split())
return counts.most_common(5)
请注意,上面返回一个包含5个元组的列表,在每个元组中,第一个元素是实际的单词,第二个元素是该单词的计数。
演示 -
>>> import collections
>>> def top5_words(text):
... counts = collections.Counter(text.split())
... return counts.most_common(5)
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
[('that', 2), ('a', 2), ('I', 2), ('the', 2), ('have', 2)]
如果您只想要元素而不是计数,那么您也可以使用列表理解来获取该信息。示例 -
import collections
def top5_words(text):
counts = collections.Counter(text.split())
return [elem for elem, _ in counts.most_common(5)]
演示 -
>>> import collections
>>> def top5_words(text):
... counts = collections.Counter(text.split())
... return [elem for elem, _ in counts.most_common(5)]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['that', 'a', 'I', 'the', 'have']
来自评论的新要求 -
对于具有相同频率的单词,似乎还有一个问题,我如何按字母顺序对相同的频率单词进行排序?
您可以先获取所有单词及其计数的列表,然后使用sorted
对计数进行排序后的首先排序,然后对单元本身进行排序(因此,当计数相同时,按字典顺序排序) 。示例 -
import collections
def top5_words(text):
counts = collections.Counter(text.lower().split())
return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
演示 -
>>> import collections
>>> def top5_words(text):
... counts = collections.Counter(text.lower().split())
... return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['a', 'have', 'i', 'that', 'the']