在foreachPartition中我想将一个String列表转换为一个rdd,我使用ssc.sparkContext.parallelize,而ssc是StreamingContext的一个对象。代码是这样的:
val ssc = new StreamingContext(sparkConf, Seconds(5))
...
words.foreachRDD(rdd => rdd.foreachPartition { partitionOfRecords => {
var sourcelist = List[String]()
partitionOfRecords.foreach(x=>{sourcelist=sourcelist.+:("source"+x)})
if(sourcelist.length > 0 ){
sourcelist.foreach { x => println(x) }
}else{
println("aa---------------------none")
}
// change the sourcelist to a RDD an convert it to a Dstream
val ssd = ssc.sparkContext.parallelize(sourcelist)
val resultInputStream = ssc.queueStream(scala.collection.mutable.Queue(ssd))
val results = resultInputStream.map(x=>x)
results.print()
}})
但是,代码引发SparkException:Task不可序列化。我真的不知道该如何处理。感谢你的帮助,请!