我正在使用text2vec
R包来复制 Common Crawl(840B令牌,2.2M vocab,带盒,300d向量,2.03 GB下载):Gloves.840B.300d.zip ,可以在上找到
https://nlp.stanford.edu/projects/glove/
在text2vec tutorial和Stanford GloVe网站上找不到我需要用来再现矢量集的设置。
成功重现“常见爬网”的正确设置是什么?
glove = GlobalVectors$new(word_vectors_size = 50, vocabulary = vocab, x_max = 10)
wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01)
这是我尝试过的:
glove = GlobalVectors$new(word_vectors_size = 840, vocabulary = vocab, x_max = 10)
wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01)