我在ubuntu中使用gensim。版本是0.12.4。我的word2vec模型不一致。每次我根据相同的句子和相同的参数构建模型时,它仍然会有不同的单词表示。
这是代码(我从最初的帖子中偷走了)
>>> from nltk.corpus import brown
>>> from gensim.models import Word2Vec
>>> sentences = brown.sents()[:100]
>>> model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4)
>>> model[sentences[0][0]]
array([ 0.04913874, 0.04574081, -0.07402877, -0.03270053, 0.06598952,
0.04157289, 0.05075986, 0.01770534, -0.03796235, 0.04594197], dtype=float32)
>>> model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4)
>>> model[sentences[0][0]]
array([ 0.04907205, 0.04569579, -0.07379777, -0.03273782, 0.06579078,
0.04167712, 0.05083019, 0.01780009, -0.0378389 , 0.04578455], dtype=float32)
>>> model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4)
>>> model[sentences[0][0]]
array([ 0.04906179, 0.04569826, -0.07382379, -0.03274316, 0.06583244,
0.04166647, 0.0508585 , 0.01777468, -0.03784611, 0.04578935], dtype=float32)
我还尝试将种子设置为某个固定的int,但这似乎没有帮助。我也尝试重新安装gensim,这也没有帮助。
知道如何稳定我的模型吗?
答案 0 :(得分:0)
尝试按此处所述设置PYTHONHASHSEED环境变量 https://github.com/gojomo/gensim/blob/develop/gensim/models/doc2vec.py#L566