Question

import nltk
from nltk.collocations import *

tokens = ['a','b','c','d','b','c','a','b','c']
tokens2 = [['a','b','c','d'],['b','c','a','b','c']]
bigrams = nltk.bigrams(tokens)

fdist = nltk.FreqDist(bigrams)
for i,j in fdist.items():
    print i,j``

print fdist.most_common(2)

上面的代码适用于像令牌这样的输入，但是当我使用tokens2时它会抛出一个错误。最终我应该让它在给出一组令牌时返回前两个双子星球。非常感谢帮助。

Answer 1

如果您有令牌列表列表（如token2），

import collections
cnt = collections.Counter()

for toks in token2:
    cnt.update(nltk.bigrams(toks))

print(cnt.most_common(2))

会奏效。如果您拥有的内容完全不同，例如tokens之类的单个列表，或者您提到的set，那么一切都可能会发生变化 - 但我们无法理解您的想法，因此您最好编辑一下问：解释完全你在追求什么！

Python Top Bigrams

1 个答案: