Question

我已使用此代码

在NLTK中生成了三元组及其频率列表

tokens = nltk.wordpunct_tokenize(docs)
from nltk.collocations import *
trigram_measures = nltk.collocations.TrigramAssocMeasures()
finderT = TrigramCollocationFinder.from_words(tokens)
scoredT = finderT.score_ngrams(trigram_measures.raw_freq)

鉴于用户定义的输入＆＃39;两个单词，我想文件列表scoredT返回输入匹配scoredT中子列表的前两项的那些值

scoredT看起来像这样

[(('out', 'to', 'the'), 2.7147650642313413e-05),
(('proud', 'of', 'you'), 2.7147650642313413e-05)]

因此，如果输入等于＆＃39; out to＆＃39;，我想过滤列表以返回＆＃39;

我试过

matches = filter(scoredT[0:len(scoredT)][0:1]==input, scoredT)

但是得到以下错误TypeError：＆＃39; bool＆＃39;对象不可调用

Answer 1

scoredT[0:len(scoredT)][0:1]==input将scoredT的第一个元素与input进行比较。所以它将是布尔值。然后将它传递给filter，这需要第一个参数是布尔值函数，因此你的错误。 pythonic方式：

matches = [(trigram, score) for (trigram, score) in scoredT if trigram[:2] == input]

此外，您需要确保input是一个元组。

根据子列表的前两项过滤列表列表 - 使用NLTK进行自然语言处理

1 个答案: