我想从RDD获取最小值集合,但最小值是多,我怎么能这样做?

时间:2017-05-23 05:46:56

标签: scala apache-spark rdd

我的rdd数据结构如下:

android.R.layout.lv_grpmsg

我希望从RDD获取最小值:

view.setTag("holder");

我怎么能这样做?

2 个答案:

答案 0 :(得分:0)

case class Item(c: Char, i: Int)
val items = Array[Item](new Item('a', 2), new Item('a', 2), new Item('a', 3), new Item('a', 3), new Item('a', 3), new Item('a', 4), new Item('a', 6), new Item('a', 5))
val rdd = sc.makeRDD(items)
val minValue = rdd.map(_.i).min()
val result = rdd.filter(item => item.i == minValue)

答案 1 :(得分:0)

您可以计算对象:

val rdd = sc.parallelize(Seq(("a", 2), ("a", 2), ("a", 3), ("a", 3), ("a", 4))

val counts = rdd.map((_, 1)).reduceByKey(_ + _)

reduce

val min counts.reduce((x, y) => if (x._1._2 <= y._1._2) x else y) 

或使用min

import scala.math.Ordering

val min = counts.min()(Ordering.by[((String, Int), Int), Int](_._1._2)) 

您可以选择在复制步骤中执行此操作:

min match {
  case (x, n) => Seq.fill(n)(x)
}

如果计数不重要,请直接使用min

rdd.min()(Ordering.by[(String, Int), Int](_._2))