以下Iterable
可以是one
,two
或(最多)three
。
org.apache.spark.rdd.RDD[Iterable[(String, String, String, String, Long)]] = MappedRDD[17] at map at <console>:75
每个元组的第二个元素可以具有以下任何值:A
,B
,C
。这些值中的每一个都可以出现(最多)一次。
我想要做的是根据以下顺序(B
,A
,C
)对它们进行排序,然后通过连接第3位的元素来创建字符串。如果缺少相应的tag
,则用空格连接:``。例如:
这样:
CompactBuffer((blah,A,val1,blah,blah), (blah,B,val2,blah,blah), (blah,C,val3,blah,blah))
应该导致:
val2,val1,val3
这样:
CompactBuffer((blah,A,val1,blah,blah), (blah,C,val3,blah,blah))
应该导致:
,val1,val3
这样:
CompactBuffer((blah,A,val1,blah,blah), (blah,B,val2,blah,blah))
应该导致:
val2,val1,
这样:
CompactBuffer((blah,B,val2,blah,blah))
应该导致:
val2,,
等等。
答案 0 :(得分:3)
如果A
,B
和C
最多出现一次,您可以将相应的值添加到临时地图中,并按照正确的顺序从地图中检索值
如果我们使用getOrElse
从地图中获取值,我们可以将空字符串指定为默认值。这样,如果我们的Iterable
不包含A
,B
和C
的所有元组,我们仍会得到正确的结果。
type YourTuple = (String, String, String, String, Long)
def orderTuples(order: List[String])(iter: Iterable[YourTuple]) = {
val orderMap = iter.map { case (_, key, value, _, _) => key -> value }.toMap
order.map(s => orderMap.getOrElse(s, "")).mkString(",")
}
我们可以按如下方式使用此功能:
val a = ("blah","A","val1","blah",1L)
val b = ("blah","B","val2","blah",2L)
val c = ("blah","C","val3","blah",3L)
val order = List("B", "A", "C")
val bacOrder = orderTuples(order) _
bacOrder(Iterable(a, b, c)) // String = val2,val1,val3
bacOrder(Iterable(a, c)) // String = ,val1,val3
bacOrder(Iterable(a, b)) // String = val2,val1,
bacOrder(Iterable(b)) // String = val2,,
答案 1 :(得分:0)
def orderTuples(xs: Iterable[(String, String, String, String, String)],
order: (String, String, String) = ("B", "A", "C")) = {
type T = Iterable[(String, String, String, String, String)]
type KV = Iterable[(String, String)]
val ord = List(order._1, order._2, order._3)
def loop(xs: T, acc: KV, vs: Iterable[String] = ord): KV = xs match {
case Nil if vs.isEmpty => acc
case Nil => vs.map((_, ",")) ++: acc
case x :: xs => loop(xs, List((x._2, x._3)) ++: acc, vs.filterNot(_ == x._2))
}
def comp(x: String) = ord.zipWithIndex.toMap.apply(x)
loop(xs, Nil).toList.sortBy(x => comp(x._1)).map(_._2).mkString(",")
}