我正在尝试在Spark中运行FPGrowth算法的示例,但是,我遇到了一个错误。这是我的代码:
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.fpm.{FPGrowth, FPGrowthModel}
val transactions: RDD[Array[String]] = sc.textFile("path/transations.txt").map(_.split(" ")).cache()
val fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(10)
val model = fpg.run(transactions)
model.freqItemsets.collect().foreach { itemset => println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)}
代码一直工作到我收到错误的最后一行:
WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 16, ip-10-0-0-###.us-west-1.compute.internal):
com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set
final scala.collection.mutable.ListBuffer field org.apache.spark.mllib.fpm.FPTree$Summary.nodes to scala.collection.mutable.ArrayBuffer
Serialization trace:
nodes (org.apache.spark.mllib.fpm.FPTree$Summary)
我甚至试图使用此处提出的解决方案: SPARK-7483
我也没有运气。 有没有人找到解决方案?或者有没有人知道只是查看结果或将它们保存到文本文件的方法?
非常感谢任何帮助!
我还找到了这个算法的完整源代码 - http://mail-archives.apache.org/mod_mbox/spark-commits/201502.mbox/%3C1cfe817dfdbf47e3bbb657ab343dcf82@git.apache.org%3E
答案 0 :(得分:2)
Kryo是比org.apache.spark.serializer.JavaSerializer更快的序列化程序。 一个可能的解决方法是告诉spark不要使用Kryo(至少在修复此错误之前)。您可以修改" spark-defaults.conf",但Kryo适用于其他spark库。所以最好用以下方法修改你的上下文:
val conf = (new org.apache.spark.SparkConf()
.setAppName("APP_NAME")
.set("spark.serializer", "org.apache.spark.serializer.JavaSerializer")
并尝试再次运行MLLIb代码:
model.freqItemsets.collect().foreach { itemset => println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)}
现在应该可以了。
答案 1 :(得分:1)
我得到了同样的错误:这是因为火花版本。在Spark 1.5.2中,这是固定的,但我使用的是1.3。我通过执行以下操作来修复:
我从使用spark-shell切换到spark-submit,然后更改了kryoserializer的配置。这是我的代码:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.fpm.FPGrowth
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.ListBuffer
object fpgrowth {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark FPGrowth")
.registerKryoClasses(
Array(classOf[ArrayBuffer[String]], classOf[ListBuffer[String]])
)
val sc = new SparkContext(conf)
val data = sc.textFile("<path to file.txt>")
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(' '))
val fpg = new FPGrowth()
.setMinSupport(0.2)
.setNumPartitions(10)
val model = fpg.run(transactions)
model.freqItemsets.collect().foreach { itemset =>
println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)
}
}
}
答案 2 :(得分:1)
在cmd或spark-defaults.conf中设置下面的配置 --conf spark.kryo.classesToRegister = scala.collection.mutable.ArrayBuffer,scala.collection.mutable.ListBuffer