如何反转RDD.takeOrdered()的排序?

时间:2014-10-15 16:43:42

标签: apache-spark rdd

在Spark中反转RDD的takeOrdered()方法的顺序的语法是什么?

对于奖励积分,Spark中RDD的自定义排序语法是什么?

3 个答案:

答案 0 :(得分:27)

逆序

val seq = Seq(3,9,2,3,5,4)
val rdd = sc.parallelize(seq,2)
rdd.takeOrdered(2)(Ordering[Int].reverse)

结果将是数组(9,5)

自定义订单

我们会按年龄对人进行排序。

case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
val rdd = sc.parallelize(people,2)
rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age))

结果将是Array(Person(ann,32))

答案 1 :(得分:8)

val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop")))

val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1))

val rdd3 = rdd2.reduceByKey((x,y) => (x+y))

//逆序(降序)

rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2))

<强>输出:

res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2))

//升序

rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2))

<强> 输出:

res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5))

答案 2 :(得分:0)

对于 K,V 对的 wordcount 类型问题。如果您想从订购列表中获取最后 10 个 -

SparkContext().parallelize(wordCounts.takeOrdered(10, lambda pair: -pair[1]))