Question

运行以下代码时。这个Python 3.6，Jupyter中最新的Gensim库

import re
strData = """HelloPleaseHelpMeUnderstand
And here not in
HereIn"""
listWords = re.findall(r"(([A-Z][a-z]+){2,})", strData)
result = [i[0] for i in listWords]
print(result)
# ['HelloPleaseHelpMeUnderstand', 'HereIn']

[1]：https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb enter image description here

Answer 1

主要问题是'Machine learning'在模型中不是已知标记。（也许您的模型知道'machine learning'或'machine_learning'或其他类似的东西？）

由于在这种情况下来自代码的错误消息很差，因此很难识别出这是真正的问题。这是gensim项目的一个已知问题：

https://github.com/RaRe-Technologies/gensim/issues/1737

Answer 2

string= "machine learning".split()

doc_vector = model.infer_vector(string)
out= model.docvecs.most_similar([doc_vector])

由于使用的是较新的版本，因此我不确定100％，但是我认为问题与most_like函数期望在功能空间中映射的字符串而不是原始字符串有关。

Gensim示例，TypeError：str和int之间的错误

2 个答案: