Question

我正在训练文件嵌入大约2000万个句子并在gensim中使用并行处理。我正在使用以下代码创建我的模型和培训

class read_corpus(object):

    def __init__(self, fname, n):
        self.fname = fname
        self.n = n

    def __iter__(self):
        num_notes = 0
        with open(self.fname, 'r') as f:
            while num_notes < n:
                note = next(f)
                sentence_id, sentence = note.split('\t')

                # remove the newline character after each line and split into words
                sentence = sentence[:-1].split(' ')

                # some processing


                yield TaggedDocument(sentence, [sentence_id])
                num_notes += 1


def model(fname, vector_size, min_count,
          n_epochs, model_name,
          n, prev_model_name=None):


    data = read_corpus(fname, n)

    if prev_model_name is not None:
        model = Doc2Vec.load(prev_model_name)
    else:
        model = Doc2Vec(vector_size=vector_size,
                        min_count=min_count,
                        workers=4,
                        window=8,
                        alpha=0.1,
                        min_alpha=0.0001)

        model.build_vocab(data)

    model.train(data, total_examples=model.corpus_count, epochs=n_epochs)
    model.save(model_name)

在6到8个时期之后，日志记录信息显示训练卡在等待工作线程。注意：日志信息显示“EPOCH 1”，因为我正在训练for循环。

... INFO : EPOCH 1 - PROGRESS: at 99.71% examples, 162493 words/s, in_qsize 8, out_qsize 0 INFO : EPOCH 1 - PROGRESS: at 99.81% examples, 162528 words/s, in_qsize 7, out_qsize 0 INFO : EPOCH 1 - PROGRESS: at 99.91% examples, 162560 words/s, in_qsize 7, out_qsize 0 INFO : worker thread finished; awaiting finish of 3 more threads INFO : worker thread finished; awaiting finish of 2 more threads

它被困在这里几个小时。

我之前的运行中有类似的输出。但是日志记录在INFO : worker thread finished; awaiting finish of 3 more threads

处停止了

Gensim工人线程卡住了

0 个答案: