python-3.x - 如何在gensim中配置bigram模型以包含自定义bigrams？

假设我要在sentence_stream中训练数据

phrases = Phrases(sentence_stream)
bigram_model = Phraser(phrases)

现在，如果我在某些测试数据上尝试我的bigram_model并检查输出

sent = [u'the', u'mayor', u'of', u'new', u'york', u'was', u'there']
print(bigram_model[sent])
[u'the', u'mayor', u'of', u'new_york', u'was', u'there']

现在，假设如果想在我的bigram_model中添加像the_mayor这样的自定义bigrams，那么输出应该包含

[u'the_mayor', u'of', u'new_york', u'was', u'there']

有关如何配置bigram_model的任何建议吗？

如何在gensim中配置bigram模型以包含自定义bigrams？

0 个答案: