gensim LdaMulticore未从命令提示符

时间:2017-08-18 13:45:37

标签: python nlp multicore gensim lda

我正在使用gensim LdaMulticore来提取主题。它在Jupyter / Ipython笔记本中工作得非常好,但是当我从命令提示符运行时,循环无限期地运行。 一旦执行到达LdaMulticore函数,执行就从第一个开始。 请帮助我,因为我是新手

if __name__ == '__main__': 
    model = models.LdaMulticore(corpus=corpus_train, id2word=dictionary, num_topics=20, chunksize=4000, passes=30, alpha=0.5, eta=0.05, decay=0.5, eval_every=10, workers=3, minimum_probability=0)

**RESULTS:-**
Moving to Topics Extraction Script---------------------------------
2017-08-18 18:59:36,448 : INFO : using serial LDA version on this node
2017-08-18 18:59:37,183 : INFO : running online LDA training, 20 topics, 1 passes over the supplied corpus of 400 documents, updating every 12000 documents, evaluating every ~400 documents, iterating 50x with a convergence threshold of 0.001000    
2017-08-18 18:59:37,183 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
2017-08-18 18:59:37,183 : INFO : training LDA model using 3 processes
2017-08-18 18:59:37,214 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #400/400, outstanding queue size 1
Importing required Packages

导入所需的包enter image description here

1 个答案:

答案 0 :(得分:0)

LdaMulticore使用所有CPU内核来并行化并加快处理速度。 并行化使用多处理概念;因此,您必须编写程序以使其支持多处理概念。

我遇到了同样的问题,所以我用LdaModel替换了LdaMulticore,并且效果很好。

root@linaro-gnome:~# echo g > /proc/sysrq-trigger 
[  128.345071] sysrq: SysRq : DEBUG
[  128.348251] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  128.356285] pgd = ffffffc878d84000
[  128.359670] [00000000] *pgd=0000000878d82003[  128.363750] , *pud=0000000878d82003
, *pmd=0000000000000000[  128.369218] 
[  128.370696] Internal error: Oops: 86000006 [#1] SMP
[  128.375571] KGDB: re-enter exception: ALL breakpoints killed
[  128.381214] ---[ end trace fcd82b678ef028fd ]---