重现GloVe Common Crawl 840B令牌的正确设置是什么?

时间:2018-10-24 23:03:52

标签: r word-embedding text2vec glove

我正在使用text2vec R包来复制 Common Crawl(840B令牌,2.2M vocab,带盒,300d向量,2.03 GB下载):Gloves.840B.300d.zip ,可以在上找到 https://nlp.stanford.edu/projects/glove/

text2vec tutorial和Stanford GloVe网站上找不到我需要用来再现矢量集的设置。

成功重现“常见爬网”的正确设置是什么?

glove = GlobalVectors$new(word_vectors_size = 50, vocabulary = vocab, x_max = 10)
wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01)

这是我尝试过的:

glove = GlobalVectors$new(word_vectors_size = 840, vocabulary = vocab, x_max = 10)
wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01)

0 个答案:

没有答案