Question

我正在研究可以分析输入文本的代码。我想寻求帮助的功能之一是列出以降序使用的单词。

通过引用堆栈溢出中的类似主题，我只能保留字母数字字符（删除所有引号/标点符号等）并将每个单词放入列表中。

这是我现在的清单。（变量称为word_list）

[“嗨”，“漂亮”，“生物”，“说”，“ by”，“罗斯柴尔德”，“ the”， “最大”，“敌人”，“ of”，“ Zun”，“ Zun”，“ started”，“ get”，“ afraid”， “ of”，“ him”，“ As”，“她”，“ best”，“ friend”，“ Lia”，“ can”，“ feel”， “她”，“恐惧”，“为什么”，“ the”，“ the”，“ hell”，“ you”，“ are”，“ here”]

（仅供参考，文本文件只是我在网络上发现的随机幻想）

但是，我无法按降序将列表修改为列表-例如，该列表中有3个“ the”，因此“ the”成为列表的第一个元素。下一个元素将是“ of”，发生2次。

我尝试了几种与我的情况类似的方法，但始终显示错误（计数器，已排序）。

有人可以教我如何对列表进行排序吗？

此外，对列表进行排序后，如何仅保留1个副本以便重复？（我当前的想法是使用for循环和索引-与以前的索引进行比较，如果相同则将其删除。）

谢谢。

Answer 1

您可以使用itertools.Counter进行不同的排序：

from collections import Counter

lst = ['Hi', 'beautiful', 'creature', 'Said', 'by', 'Rothchild', 'the', 'biggest', 'enemy', 'of', 'Zun', 'Zun', 'started', 'get', 'afraid', 'of', 'him', 'As', 'her', 'best', 'friend', 'Lia', 'can', 'feel', 'her', 'fear', 'Why', 'the', 'the', 'hell', 'you', 'are', 'here']

c = Counter(lst)  # mapping: {item: frequency}

# now you can use the counter directly via most_common (1.)
lst = [x for x, _ in c.most_common()]
# or as a sort key (2.)
lst = sorted(set(lst), key=c.get, reverse=True)

# ['the', 'Zun', 'of', 'her', 'Hi', 'hell', 'him', 'friend', 'Lia', 
#  'get', 'afraid', 'Rothchild', 'started', 'by', 'can', 'Why', 'fear', 
#  'you', 'are', 'biggest', 'enemy', 'Said', 'beautiful', 'here', 
#  'best', 'creature', 'As', 'feel']

这些方法使用Counter键（1.）或set删除重复项。

但是，如果您希望排序相对于原始列表而言是稳定的（等频项的出现顺序保持不变），则可能必须按照基于collections.OrderedDict的重复删除方法执行以下操作：

from collections import OrderedDict

lst = sorted(OrderedDict.fromkeys(lst), key=c.get, reverse=True)

# ['the', 'of', 'Zun', 'her', 'Hi', 'beautiful', 'creature', 'Said', 
# 'by', 'Rothchild', 'biggest', 'enemy', 'started', 'get', 'afraid', 
# 'him', 'As', 'best', 'friend', 'Lia', 'can', 'feel', 'fear', 'Why',  
# 'hell', 'you', 'are', 'here']

Python3-如何按其元素的频率对列表进行排序？

1 个答案: