Question

我有一个列表理解，一旦我添加＆＃39;不在停止＆＃39;方法。基本上，当我为这个NLTK包含停用词时，我以前的排序功能就丢失了。谁能指出我做错了什么？

我现在已将所有内容都包含在代码中以便更好地参考。

编辑：

from nltk import word_tokenize
from nltk.corpus import stopwords
import string

stop = stopwords.words('english') + list(string.punctuation)
f = open('review_text_all.txt', encoding="utf-8")
raw = f.read().lower().replace("'", "").replace("\\", "").replace(",", 
"").replace("\ufeff", "")

tokens = nltk.word_tokenize(raw)

bgs = nltk.bigrams(tokens)

fdist = nltk.FreqDist(bgs)
for (k,v) in sorted(fdist.items(), key=lambda x: (x[1] not in stop), 
reverse=True):
    print(k,v)

这是我的结果，而不是停止＆＃39;

('or', 'irish') 3
('put', 'one') 1
('was', 'repealed') 1
('please', '?') 6
('contact', 'your') 2
('wear', 'sweats') 1

没有＆＃39;没有停止＆＃39;

('white', 'people') 4362
('.', 'i') 3734
('in', 'the') 2880
('of', 'the') 2634
('to', 'be') 2217
('all', 'white') 1778

正如您可以看到已排序的作品，但只有在我删除了“不在停止后”

Answer 1

the sorted method的key参数是一个函数，它可以告诉python哪个键（与列表项相关的属性/值）进行排序。

在你的情况下，你的函数将返回True或False ....这不是一个非常好的值来进行排序：）

编辑：

根据我对您想要实现的目标的理解，您需要在排序filter method之前（或之后）添加，该排序将从列表中删除“停用词”列表中的项目。< / p>

这样的事情：

for (k,v) in sorted(filter(lambda x: (x[1] not in stop), fdist.items()), key=lambda x: x[1], reverse=True):
    print(k,v)

排序键值lambda不起作用

1 个答案: