目的是重新分区RDD [((Int,Double),Int)]并根据密钥中的第二个元素(即Double字段)在每个分区内排序。我尝试的是
implicit val ordering: Ordering[(Int, Double)] = Ordering.by(fk => (fk._1, fk._2 * -1))
但是我得到了javaNullPointerException。
以下是org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 (TID 2089, x01tbipapp3a): java.lang.NullPointerException
at scala.math.Ordering$$anonfun$by$1.apply(Ordering.scala:219)
at scala.math.Ordering$$anonfun$by$1.apply(Ordering.scala:219)
at scala.math.Ordering$$anon$9.compare(Ordering.scala:200)
at org.apache.spark.util.collection.WritablePartitionedPairCollection$$anon$3.compare(WritablePartitionedPairCollection.scala:86)
at org.apache.spark.util.collection.WritablePartitionedPairCollection$$anon$3.compare(WritablePartitionedPairCollection.scala:80)
at org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:252)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:110)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at org.apache.spark.util.collection.PartitionedPairBuffer.partitionedDestructiveSortedIterator(PartitionedPairBuffer.scala:78)
at org.apache.spark.util.collection.ExternalSorter.partitionedIterator(ExternalSorter.scala:643)
at org.apache.spark.util.collection.ExternalSorter.iterator(ExternalSorter.scala:654)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:107)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
我确实找到了隐含的
implicit val ordering: Ordering[(Int, Double)] = Ordering.by(fk => fk._2 * -1)