如何在repartitionAndSortWithinPartitions中定义作为元组的键类型的隐式排序?

时间:2016-09-30 08:38:23

标签: scala apache-spark

目的是重新分区RDD [((Int,Double),Int)]并根据密钥中的第二个元素(即Double字段)在每个分区内排序。我尝试的是

implicit val ordering: Ordering[(Int, Double)] = Ordering.by(fk => (fk._1, fk._2 * -1))

但是我得到了javaNullPointerException。

以下是

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 (TID 2089, x01tbipapp3a): java.lang.NullPointerException
        at scala.math.Ordering$$anonfun$by$1.apply(Ordering.scala:219)
        at scala.math.Ordering$$anonfun$by$1.apply(Ordering.scala:219)
        at scala.math.Ordering$$anon$9.compare(Ordering.scala:200)
        at org.apache.spark.util.collection.WritablePartitionedPairCollection$$anon$3.compare(WritablePartitionedPairCollection.scala:86)
        at org.apache.spark.util.collection.WritablePartitionedPairCollection$$anon$3.compare(WritablePartitionedPairCollection.scala:80)
        at org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:252)
        at org.apache.spark.util.collection.TimSort.sort(TimSort.java:110)
        at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
        at org.apache.spark.util.collection.PartitionedPairBuffer.partitionedDestructiveSortedIterator(PartitionedPairBuffer.scala:78)
        at org.apache.spark.util.collection.ExternalSorter.partitionedIterator(ExternalSorter.scala:643)
        at org.apache.spark.util.collection.ExternalSorter.iterator(ExternalSorter.scala:654)
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:107)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

我确实找到了隐含的

implicit val ordering: Ordering[(Int, Double)] = Ordering.by(fk => fk._2 * -1)

0 个答案:

没有答案