我想使用三元组和双字母,因为我不想只使用unigrams。
bigramer = gensim.models.Phrases(sentences)
model = Word2Vec(bigramer[sentences], workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling)
from nltk import bigrams
from nltk import trigrams
from gensim.models import Phrases
from gensim.models.phrases import Phraser
trigrams = Phrases(bigrams[sentence_stream])
但是,我遇到了这个错误。
NameErrorTraceback (most recent call last)
<ipython-input-161-15b0101c13b1> in <module>()
----> 1 trigrams = Phrases(bigrams[sentence_stream])
NameError: name 'sentence_stream' is not defined
答案 0 :(得分:0)
我通过将代码重写为:
解决了这个问题bigram = Phrases(sentences, min_count=1, threshold=1)
print list(bigram[sentences])
trigram = Phrases(bigram[sentences],min_count=1, threshold=1)
print list(trigram[bigram[sentences]])