当运行mahout spark-itemsimilarity给出错误?

时间:2015-04-27 13:53:09

标签: apache-spark mahout mahout-recommender


我运行时收到以下Stack-Trace错误


./mahout spark-itemsimilarity --input input-file --output /output_dir --master spark://url_to_master --filter1 purchase --filter2 view --itemIDColumn 2 --rowIDColumn 0 --filterColumn 1

在linux终端。
我从github Mahout分支 spark-1.2 克隆了这个项目
mvn install
在mahout源代码目录中。而不是cd mahout/bin/

java.lang.NoClassDefFoundError: com/google/common/collect/HashBiMap
    at org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator.registerClasses(MahoutKryoRegistrator.scala:39)
    at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$4.apply(KryoSerializer.scala:104)
    at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$4.apply(KryoSerializer.scala:104)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:104)
    at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:159)
    at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121)
    at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 22 more

请帮忙! 感谢。

1 个答案:

答案 0 :(得分:1)

Mahout 0.10.0支持Spark 1.1.1或更低版本。如果您从source构建并在mahout / pom.xml中更改主pom中的Spark版本号,则可以为Spark 1.2构建,但您必须使用下面描述的解决方法。名称中带有“dependency-reduced”的jar将在mahout / spark / target中。正在开发Spark 1.2分支,因此不需要上述修复。这可能是准备尝试的一周。

Spark 1.2中存在一个错误,不确定它是否已在1.3中修复。

请在此处查看:https://issues.apache.org/jira/browse/SPARK-6069

对我来说有用的是把jar用guava(它将被称为mahout-spark_2.10-0.11.0-SNAPSHOT-dependency-reduced.jar或类似的东西)放在所有工人身上然后将该位置传递给Mahout工作使用:

spark-itemsimilarity -D:spark.executor.extraClassPath=/path/to/mahout/spark/target/mahout-spark_2.10-0.11-dependency-reduced.jar

路径必须包含所有工作人员的

代码解决方法将在下周左右进入spark-1.2分支,这将使-D:spark.executor.extraClassPath=/path/to/mahout...不再需要。