Gensim槌虫?多次加载保存的模型失败

时间:2018-08-10 11:14:48

标签: python gensim lda topic-modeling mallet

我正在尝试加载已保存的gensim lda槌:

 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=n_topics,id2word=id2word)
 ldamallet.save('ldamallet')

在针对新查询(使用原始语料库和字典)进行测试时,第一次加载似乎一切正常。

ques_vec = [dictionary.doc2bow(words) for words in data_words_list]
for i, row in enumerate(lda[ques_vec]):
    row = sorted(row, key=lambda x: (x[1]), reverse=True)

随后执行相同的代码时,会弹出此错误:

  

java.io.FileNotFoundException:/tmp/9f371_corpus.mallet(没有这样的文件   或目录)           在java.io.FileInputStream.open0(本地方法)           在java.io.FileInputStream.open(FileInputStream.java:195)           在java.io.FileInputStream。(FileInputStream.java:138)           在cc.mallet.types.InstanceList.load(InstanceList.java:787)           在cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:131)   线程“主”中的异常java.lang.IllegalArgumentException:   无法从文件/tmp/9f371_corpus.mallet中读取InstanceList           在cc.mallet.types.InstanceList.load(InstanceList.java:794)           在cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:131)   追溯(最近一次通话):文件“ topic_modeling1.py”,行   406,在       topic = get_label(text,id2word,first,ldamallet)文件“ topic_modeling1.py”,第237行,位于get_label中       对于我,在enumerate(lda [ques_vec])中的行:文件“ /home/user/sjha/anaconda3/envs/conda_env/lib/python3.6/site-packages/gensim/models/wrappers/ldamallet.py”,行308,在获取项中       self.convert_input(bow,infer = True)文件“ /home/user/sjha/anaconda3/envs/conda_env/lib/python3.6/site-packages/gensim/models/wrappers/ldamallet.py”,行256,在convert_input       check_output(args = cmd,shell = True)文件“ /home/user/sjha/anaconda3/envs/conda_env/lib/python3.6/site-packages/gensim/utils.py”,   第1806行,在check_output中       引发错误subprocess.CalledProcessError:命令'/home/user/sjha/projects/topic_modeling/mallet-2.0.8/bin/mallet   import-file --preserve-case --keep-sequence --remove-stopwords   --token-regex“ \ S +”-输入/tmp/9f371_corpus.txt-输出/tmp/9f371_corpus.mallet.infer --use-pipe-from   /tmp/9f371_corpus.mallet'返回非零退出状态1。

我的/tmp/目录的内容:

/tmp/9f371_corpus.txt  /tmp/9f371_doctopics.txt /tmp/9f371_doctopics.txt.infer  /tmp/9f371_inferencer.mallet  /tmp/9f371_state.mallet.gz  /tmp/9f371_topickeys.txt

此外,似乎每次加载模型时文件/tmp/9f371_doctopics.txt.infer/tmp/9f371_corpus.txt都会被修改。可能的错误源是什么?还是gensim的槌状包装纸中存在某种错误?

1 个答案:

答案 0 :(得分:0)

删除我的/tmp/目录中与槌相关的内容对我来说解决了问题