使用gensim的短语获取三元组时出错

时间:2017-09-11 01:14:35

标签: python nlp data-mining text-mining gensim

我想提取给定句子的所有bigrams和trigrams。

from gensim.models import Phrases
documents = ["the mayor of new york was there", "Human Computer Interaction is a great and new subject", "machine learning can be useful sometimes","new york mayor was present", "I love machine learning because it is a new subject area", "human computer interaction helps people to get user friendly applications"]

sentence_stream = [doc.split(" ") for doc in documents]
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
trigram = Phrases(bigram(sentence_stream, min_count=1, threshold=2, delimiter=b' '))

for sent in sentence_stream:
    #print(sent)
    bigrams_ = bigram[sent]
    trigrams_ = trigram[bigrams_]

    print(bigrams_)
    print(trigrams_)

该代码适用于双字母并捕获纽约'和机器学习' ad bigrams。

但是,当我尝试插入三元组时,我收到以下错误。

TypeError: 'Phrases' object is not callable

请告诉我,如何更正我的代码。

我正在关注example documentation gensim。

1 个答案:

答案 0 :(得分:0)

根据docs,你可以这样做:

from gensim.models import Phrases
from gensim.models.phrases import Phraser 

phrases = Phrases(sentence_stream)
bigram = Phraser(phrases)
trigram = Phrases(bigram[sentence_stream])
作为bigram对象的

Phrases无法再次调用,正如您所做的那样。