我正在尝试使用Mahout中的Naive Bayes分类器来对某些产品数据进行分类。
我使用solr
将我的数据集转换为lucene索引,然后使用Mahout split命令创建训练和保持集。这似乎工作正常。
现在我正准备使用trainnb训练朴素贝叶斯模型,但我收到以下错误:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:119)
at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:152)
at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:92)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.main(TrainNaiveBayesJob.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
这是我的命令输入:
$MAHOUT_HOME/bin/./mahout trainnb -i ~/training_output/Amazon_training_output/ -el -o ~/model/Amazon -li ~/labelindex/Amazon -ow -c
错误在这种情况下意味着什么,我该如何解决?
我的原始索引是否可能受到责备?
答案 0 :(得分:0)
也许您的密钥格式不正确?基于代码,我看到关键是期待/ string / int
第119,131行 http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.mahout/mahout-core/0.7-cdh4.1.3/org/apache/mahout/classifier/naivebayes/BayesUtils.java?av=f