Scala - 将列表合并到地图

时间:2016-03-15 22:04:21

标签: scala rdd

我需要将列表合并到RDD的集合中,但我在Scala中遇到了困难:

var accounts = set("name" -> "", "id" -> 0, ....)

//Split the RDD into lines and split each line by `|` to get the values
stream.foreachRDD {_.map(_._2).flatMap(_.split("|")).foreach(f => /*merge here ?*/)}

如何将值与我的帐户集相关联?

例如,假设从CSV加载的RDD(我编写了这个数据)

 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
 ...

RDD最多有300列/字段。

我的主要目标是将其转换为某些json,但我需要将每个值与一个键相关联,方法是将其加载到地图或类中。

var election = Map ("firstname" -> "Donald",
"lastname" -> "Trump",
"country" -> "US",
"event" -> "Election",
"period" -> "March"
"var1" -> "Spring",
 ....
"varN" -> "...")

2 个答案:

答案 0 :(得分:1)

我不确定我是否理解正确,但这有帮助吗?

val data = List(
  "Donald|Trump|US|Election|March",
  "John|Smith|UK|Election|February"
)

val mapKeys = List("firstname", "lastname", "country", "event", "period")

val election = data.map { row =>
  (mapKeys zip row.split("\\|").toList).map {
    case (key, value) => key -> value
  }.toMap
}

因此,您将获得一个地图列表 - 对于您的数据的每一行,您将获得所描述的键/值对映射。

答案 1 :(得分:0)

@slouc回答

有点干净
stream.foreachRDD {_.map(_._2).map(l => (mapKeys zip l.split("\\|")).toMap).saveToEs(conf)}