EMR上运行Mahout朴素贝叶斯算法时的训练误差

时间:2013-02-14 02:52:01

标签: mahout

我尝试在EMR上使用1个主(小)和1个从(小)节点运行朴素的byes算法。我使用seqdirectory,seq2sparse和split命令成功完成了步骤。但是在训练阶段我遇到了错误。我使用以下命令来训练算法:

./elastic-mapreduce --jar s3n://<bucket name>/mahout/mahout-examples-0.7-job.jar \
    --main-class org.apache.mahout.driver.MahoutDriver \
    --logs \
    --arg trainnb \
    --arg -i --arg /<folder name>/mahout/review-train-vectors/ --arg -el\
    --arg -o --arg /<folder name>/mahout/model/ \
    --arg -li --arg /<folder name>/mahout/labelindex/ \
    --arg -ow \
    -j <job-name>

以下是工作步骤的日志:

java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201302130846_0035_m_000000_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201302130846_0035_m_000000_0: SLF4J: Found binding in [jar:file:/home/hadoop  /lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201302130846_0035_m_000000_0: SLF4J: Found binding in [jar:file:/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201302130846_0035/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201302130846_0035_m_000000_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201302130846_0035_m_000000_1: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201302130846_0035_m_000000_1: SLF4J: Found binding in [jar:file:/home/hadoop  /lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201302130846_0035_m_000000_1: SLF4J: Found binding in [jar:file:/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201302130846_0035/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201302130846_0035_m_000000_1: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201302130846_0035_m_000000_2: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201302130846_0035_m_000000_2: SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201302130846_0035_m_000000_2: SLF4J: Found binding in [jar:file:/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201302130846_0035/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201302130846_0035_m_000000_2: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

以前有人试过这个东西吗?请帮我解决这个问题。当我在本地系统上使用hadoop伪分布式模式运行此算法时,我也遇到了同样的问题。此算法仅适用于MAHOUT_LOCAL = True环境变量。

1 个答案:

答案 0 :(得分:1)

命令的参数存在问题。看起来您复制并粘贴命令而不根据您的环境进行调整:

  --jar s3n://<bucket name>/mahout/mahout-examples-0.7-job.jar

什么是桶名?

 --arg -i --arg /<folder name>/mahout/review-train-vectors/

<folder name>看起来像你应该根据你的情况改变的变量

-j <job-name>

同样的错误。看来你不是一个经验丰富的linux用户,要注意每行末尾的字符\应该被跳过(很可能是在你接过命令的网页上。页面更易读(你确定它是一个命令 - 在许多行上没有多少命令):))