from deepdist import DeepDist
from gensim.models.word2vec import Word2Vec
from pyspark import SparkConf, SparkContext
conf = (SparkConf()
.setAppName("Work2Vec")
)
sc = SparkContext(conf=conf)
corpus = sc.textFile('AllText.txt').map(lambda s: s.split())
def gradient(model, sentences):
syn0, syn1 = model.syn0.copy(), model.syn1.copy() # previous weights
model.train(sentences)
return {'syn0': model.syn0 - syn01, 'syn1': model.syn1 - syn1}
def descent(model, update):
model.syn0 += update['syn0']
model.syn1 += update['syn1']
with DeepDist(Word2Vec(corpus.collect())) as dd:
dd.train(corpus, gradient, descent)
dd.model.save("Model")
请帮帮我,我有一个56Gb文本,想要建立一个word2Vec模型,但只使用gensim非常慢,所以我在网上尝试深度和他们的示例代码,所以我只是想知道有没有人见过这种错误
运行此脚本时的输出:
答案 0 :(得分:0)
请注意,您复制和粘贴的代码有一个拼写错误,可通过此拉取请求进行更正:https://github.com/dirkneumann/deepdist/pull/1