我正在为我的数据训练mahout分类器, 我发出以下命令来创建mahout模型
./bin/mahout seqdirectory -i /tmp/mahout-work-root/MyData-all -o /tmp/mahout-work-root/MyData-seq
./bin/mahout seq2sparse -i /tmp/mahout-work-root/MyData-seq -o /tmp/mahout-work-root/MyData-vectors -lnorm -nv -wt tfidf
./bin/mahout split -i /tmp/mahout-work-root/MyData-vectors/tfidf-vectors --trainingOutput /tmp/mahout-work-root/MyData-train-vectors --testOutput /tmp/mahout-work-root/MyData-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
./bin/mahout trainnb -i /tmp/mahout-work-root/Mydata-train-vectors -el -o /tmp/mahout-work-root/model -li /tmp/mahout-work-root/labelindex -ow
当我尝试使用trainnb命令创建模型时,我得到以下异常:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:119) at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:152)
这可能是什么问题?
注意:提到的原始示例here工作正常。
答案 0 :(得分:0)
我认为这可能是您如何放置培训文件的问题。 文件应按如下方式组织:
迈德特-所有
\ CLASSA
-file1
-file2
-...
\ CLASSB
-filex
...