使用gensim的Doc2Vec生成句子向量

时间:2015-08-17 17:11:15

标签: python vector gensim

我正在尝试使用Doc2Vec来读取一个像这样的句子列表的文件:

The elephant flaps its large ears to cool the blood in them and its body.

A house is a permanent building or structure for people or families to live in.

...

我想要做的是生成两个文件,一个用这些句子中的唯一单词,另一个每行有一个相应的向量(如果没有向量输出我想输出0的向量)

我的代码很好,但我似乎无法弄清楚如何打印单个句子向量。我查看了文档,但没有找到太多帮助。这是我的代码到目前为止的样子。

sentences = []
for uid, line in enumerate(open(filename)):
    sentences.append(LabeledSentence(words=line.split(), labels=['SENT_%s' %       uid]))

model = Doc2Vec(alpha=0.025, min_alpha=0.025)
model.build_vocab(sentences)
for epoch in range(10):
    model.train(sentences)
    model.alpha -= 0.002
    model.min_alpha = model.alpha
sent_reg = r'[SENT].*'
for item in model.vocab.keys():
    sent = re.search(sent_reg, item)
    if sent:
        continue
    else:
        print item

###I'm not sure how to produce the vectors from here and this doesn't work##   
sent_id = 0
for item in model:
    print model["SENT_"+str(sent_id)]
    sent_id += 1

1 个答案:

答案 0 :(得分:3)

使用最新的gensim(0.12.1),您可以尝试:

print model.docvecs [“SENT _”+ str(sent_id)]