Question

我正在玩NLTK和模块freqDist

import nltk
from nltk.corpus import gutenberg
print(gutenberg.fileids())
from nltk import FreqDist
fd = FreqDist()

for word in gutenberg.words('austen-persuasion.txt'):
    fd[word] += 1

newfd = sorted(fd, key=fd.get, reverse=True)[:10]

所以我正在玩NLTK并对排序部分有疑问。当我像这样运行代码时，它会正确排序freqDist对象。但是，当我用get（）而不是get运行它时，我会遇到错误

Traceback (most recent call last):
  File "C:\Python34\NLP\NLP.py", line 21, in <module>
newfd = sorted(fd, key=fd.get(), reverse=True)[:10]
TypeError: get expected at least 1 arguments, got 0

为什么是正确的并且得到（）错误。我认为get（）应该是正确的，但我想它不是。

Answer 1

基本上，FreqDist中的NLTK对象是本机Python的collections.Counter的子类，所以让我们看看Counter的工作原理：

Counter是一个字典，它将列表中的元素作为其键存储，并将元素的计数存储为值：

>>> from collections import Counter
>>> Counter(['a','a','b','c','c','c','d'])
Counter({'c': 3, 'a': 2, 'b': 1, 'd': 1})
>>> c = Counter(['a','a','b','c','c','c','d'])

要获取按频率排序的元素列表，可以使用.most_common()函数，它将返回元素的元组及其计数按计数排序。

>>> c.most_common()
[('c', 3), ('a', 2), ('b', 1), ('d', 1)]

相反：

>>> list(reversed(c.most_common()))
[('d', 1), ('b', 1), ('a', 2), ('c', 3)]

像字典一样，您可以遍历Counter对象并返回键：

>>> [key for key in c]
['a', 'c', 'b', 'd']
>>> c.keys()
['a', 'c', 'b', 'd']

您还可以使用.items()函数来获取键及其值的元组：

>>> c.items()
[('a', 2), ('c', 3), ('b', 1), ('d', 1)]

或者，如果您只需要按其计数排序的键，请参阅Transpose/Unzip Function (inverse of zip)?：

>>> k, v = zip(*c.most_common())
>>> k
('c', 'a', 'b', 'd')

回到.get vs .get()的问题，前者是函数本身，而后者是函数的一个实例，需要字典的键作为参数：

>>> c = Counter(['a','a','b','c','c','c','d'])
>>> c
Counter({'c': 3, 'a': 2, 'b': 1, 'd': 1})
>>> c.get
<built-in method get of Counter object at 0x7f5f95534868>
>>> c.get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: get expected at least 1 arguments, got 0
>>> c.get('a')
2

调用sorted()时，key=...函数中的sorted参数不您要排序的列表/词典的键，但是键sorted应该用于排序。

所以它们是相同的，但它们只返回键的值：

>>> [c.get(key) for key in c]
[2, 3, 1, 1]
>>> [c[key] for key in c]
[2, 3, 1, 1]

在排序时，这些值被用作排序的标准，因此它们实现了相同的输出：

>>> sorted(c, key=c.get)
['b', 'd', 'a', 'c']
>>> v, k = zip(*sorted((c.get(key), key) for key in c))
>>> list(k)
['b', 'd', 'a', 'c']
>>> sorted(c, key=c.get, reverse=True) # Highest to lowest
['c', 'a', 'b', 'd']
>>> v, k = zip(*reversed(sorted((c.get(key), key) for key in c)))
>>> k
('c', 'a', 'd', 'b')

使用get vs get（）对NLTK中的FreqDist进行排序

1 个答案: