我对继承的代码有一些问题:
for ((catAttribs, (dataIter, queryIter)) <- localCollection) {
println("in the loop")
val bCastData = dataIter
val spQueries = sc.parallelize(queryIter.toSeq, numReducers)
val a = spQueries.count
println("the value of a is: ")
println(a)
val type3AboveResult = spQueries.mapPartitions(queryPartitionIter => extractType3Results( bCastData, queryPartitionIter.toIterable, kLikelihood)).flatMap(x => x)
}
println("up to here now")
我得到的错误信息是:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1622) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:635) at org.dave.examples.NN$$anonfun$main$2.apply(NN.scala:374) at org.dave.examples.NN$$anonfun$main$2.apply(NN.scala:367) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.dave.examples.NN$.main(NN.scala:367) at org.dave.examples.NN.main(NN.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@5fe3ed23)
- field (class: org.dave.examples.NN$$anonfun$main$2, name: sc$1, type: class org.apache.spark.SparkContext)
- object
(class org.dave.examples.NN$$anonfun$main$2, <function1>)
- field (class: org.dave.examples.NN$$anonfun$main$2$$anonfun$25, name: $outer, type: class org.dave.examples.NN$$anonfun$main$2)
- object (class org.dave.examples.NN$$anonfun$main$2$$anonfun$25, <function1>)
- field (class: org.apache.spark.rdd.RDD$$anonfun$14, name: f$3, type: interface scala.Function1)
- object (class org.apache.spark.rdd.RDD$$anonfun$14, <function3>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
... 20 more
如果我排除以下行,代码就会运行:
val type3AboveResult = spQueries.mapPartitions(queryPartitionIter => extractType3Results( bCastData, queryPartitionIter.toIterable, kLikelihood)).flatMap(x => x)
这适用于本地模式,但不适用于群集模式。任何帮助赞赏。我在下面添加了一些代码,其中显示了程序的结构。
object NN{
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Nearest Neighbours")
val sc = new SparkContext(conf)
....<exluded non relevant code>
class Person(numerAttribs : Array[Double], tfn : String, targetAttribs: Array[Double]) extends Serializable {
...<exluded non relevant code>
def extractType3Results(dataIter : Iterable[Person], queryIter : Iterable[Person], kInner : Int) : Iterator[List[List[Person]]] = {
val dataA = dataIter.map(person => (person.toTupleNumeric, person)).toArray
val kdMap = KDTreeMap.fromSeq(dataA)
var resultList = List[List[Person]]()
var qIter = queryIter.iterator
while (qIter.hasNext) {
val curQuery = qIter.next;
// we are looking for kInner + 1 since the data index may contain the query object
val kNNresult = kdMap.findNearest(curQuery.toTupleNumeric, kInner+1)
val filteredResult = kNNresult.filter{case (attribs, person) => person.TFN != curQuery.TFN}
if (filteredResult.size == kInner) {
resultList = (curQuery :: filteredResult.toList.map{case(attribs, person) => person}) :: resultList
}
else {
resultList = (curQuery:: filteredResult.toList.map{case(attribs, person) => person}.sortWith((a, b) => a.distComp (b,curQuery)).take(kInner)) :: resultList
}
}
return Iterator.single(resultList)
}//end extractType3Results
}//end class person
val rawData = sc.textFile("hdfs://......W_org.csv",150).map(_.split (" ")).cache()
val dataRDD = rawData.map(a=> (getCategoricalAttributes(a,indexes(0)), new Person(getStdDoubleAttributes(a,indexes (1),attsMean,attsStdev), a(0),getDoubleAttributes(a,indexes(2)) ))).coalesce(150)
val queryRDD = dataRDD.cache()
val coGrouped = dataRDD.cogroup(queryRDD, numMappers).cache
val type1ResultCogroups = coGrouped.filter{case (catAttribs, (dataIter, queryIter)) => (dataIter.count(x => true) >= peerCodeThreshold) && (dataIter.count(x => true) <= kLikelihood)}
val type2ResultCogroups = coGrouped.filter{case (catAttribs, (dataIter, queryIter)) => (dataIter.count(x => true) > kLikelihood) && (dataIter.count(x => true) <= linearSearchThreshold) }
val type3BelowResultCogroups = coGrouped.filter{case (catAttribs, (dataIter, queryIter)) => (dataIter.count(x => true) > Math.max(kLikelihood,linearSearchThreshold)) && (queryIter.count(x => true) <= querySplitThreshold )}
val type3AboveResultCogroups = coGrouped.filter{case (catAttribs, (dataIter, queryIter)) => (dataIter.count(x => true) > Math.max(kLikelihood,linearSearchThreshold)) && (queryIter.count(x => true) > querySplitThreshold )}
val type1Result = type1ResultCogroups.flatMap{case (catAttribs, (dataIter, queryIter)) => extractType1Results(dataIter, queryIter).flatten}
val type2Result = type2ResultCogroups.flatMap{case (catAttribs, (dataIter, queryIter)) => extractType2Results(dataIter, queryIter, kLikelihood).flatten}
val type3BelowResult = type3BelowResultCogroups.flatMap{case (catAttribs, (dataIter, queryIter)) => extractType3Results(dataIter, queryIter, kLikelihood).flatten}
var finalResult = type3BelowResult.union(type2Result).union(type1Result).coalesce(numReducers)
val localCollection = type3AboveResultCogroups.collect
for ((catAttribs, (dataIter, queryIter)) <- localCollection) {
val bCastData = dataIter //sc.broadcast(dataIter)
val spQueries = sc.parallelize(queryIter.toSeq, numReducers)
val type3AboveResult = spQueries.mapPartitions(queryPartitionIter => extractType3Results( bCastData, queryPartitionIter.toIterable, kLikelihood)).flatMap(x => x)
finalResult = finalResult.union(type3AboveResult)
}
val finalResult1 = finalResult.cache()
} //结束主要 } //结束对象