Gensim word2vec模型中的跟踪丢失和嵌入

时间:2019-01-29 14:03:16

标签: gensim word2vec

我对Gensim还是很陌生,我正尝试使用word2vec模型训练我的第一个模型。我看到所有参数都非常简单易懂,但是我不知道如何跟踪模型的丢失情况以查看进度。另外,我希望能够在每个时期之后获得嵌入,以便我还可以显示,每个时期之后的预测也将获得更多的逻辑。我该怎么办?

OR,是否最好每次训练 iter = 1 并保存每个时期后的损失和嵌入?听起来不太有效。

要显示的代码很少,但仍将其张贴在下面:

.navbar-collapse.collapse.collapse {
    overflow-x: hidden !important;
    overflow-y: auto !important;
}

1 个答案:

答案 0 :(得分:4)

from gensim.models.callbacks import CallbackAny2Vec class MonitorCallback(CallbackAny2Vec): def __init__(self, test_words): self._test_words = test_words def on_epoch_end(self, model): print("Model loss:", model.get_latest_training_loss()) # print loss for word in self._test_words: # show wv logic changes print(model.wv.most_similar(word)) """ prepare datasets etc. ... ... """ monitor = MonitorCallback(["word", "I", "less"]) # monitor with demo words model = Word2Vec(sentences = trainset, iter = 5, # epoch min_count = 10, size = 150, workers = 4, sg = 1, hs = 1, negative = 0, window = 9999, callbacks=[monitor]) 允许我们将callbacks用于此类目的。

示例:

get_latest_training_loss
  • 现在有一些loggingid-可能是不正确的(运气不好,现在github掉了,无法检查)。我已经测试了此代码,并且损失增加了-看起来很奇怪。
  • 也许您更喜欢var data = [{ id: "1" }, { id: "2" }, { id: "3" }], unwanted = ["1", "8"], filtered = data.filter(({ id }) => !unwanted.includes(id)); console.log(filtered);-gensim为此issues