Question

我在ubuntu中使用gensim。版本是0.12.4。我的word2vec模型不一致。每次我根据相同的句子和相同的参数构建模型时，它仍然会有不同的单词表示。

这是代码（我从最初的帖子中偷走了）

>>> from nltk.corpus import brown
>>> from gensim.models import Word2Vec
>>> sentences = brown.sents()[:100]
>>> model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4)
>>> model[sentences[0][0]]
array([ 0.04913874,  0.04574081, -0.07402877, -0.03270053,  0.06598952,
        0.04157289,  0.05075986,  0.01770534, -0.03796235,  0.04594197], dtype=float32)
>>> model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4)
>>> model[sentences[0][0]]
array([ 0.04907205,  0.04569579, -0.07379777, -0.03273782,  0.06579078,
        0.04167712,  0.05083019,  0.01780009, -0.0378389 ,  0.04578455], dtype=float32)
>>> model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4)
>>> model[sentences[0][0]]
array([ 0.04906179,  0.04569826, -0.07382379, -0.03274316,  0.06583244,
        0.04166647,  0.0508585 ,  0.01777468, -0.03784611,  0.04578935], dtype=float32)

我还尝试将种子设置为某个固定的int，但这似乎没有帮助。我也尝试重新安装gensim，这也没有帮助。

知道如何稳定我的模型吗？

Answer 1

尝试按此处所述设置PYTHONHASHSEED环境变量 https://github.com/gojomo/gensim/blob/develop/gensim/models/doc2vec.py#L566

gensim word2vec给出不一致的结果

1 个答案: