我正在尝试使用Mallet LDA在1M文档数据集上构建包含500或1000个主题的模型。经过60次迭代后,我得到ArrayIndexOutOfBoundsException
。错误消息如下:
<60> LL/token: -7.64386
overflow on type 8
java.lang.ArrayIndexOutOfBoundsException: 500
at cc.mallet.topics.WorkerRunnable.buildLocalTypeTopicCounts(WorkerRunnable.java:208)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
overflow on type 8
我正在运行的命令是:
bin/mallet train-topics
--input data.mallet
--output-model lda.model
--inferencer-filename topic-inferencer-model.mallet
--output-topic-keys topic-keys.txt
--topic-word-weights-file topic-word-weights.txt
--word-topic-counts-file word-topic-counts-file.txt
--output-doc-topics doc-topics.txt
--num-topics 500
--num-threads 16
--num-iterations 1500
--use-symmetric-alpha FALSE
非常感谢任何建议。