我需要将列表合并到RDD
的集合中,但我在Scala中遇到了困难:
var accounts = set("name" -> "", "id" -> 0, ....)
//Split the RDD into lines and split each line by `|` to get the values
stream.foreachRDD {_.map(_._2).flatMap(_.split("|")).foreach(f => /*merge here ?*/)}
如何将值与我的帐户集相关联?
例如,假设从CSV加载的RDD(我编写了这个数据)
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
Donald|Trump|US|Election|March|Spring|Rubio|Ted Cruz|Ben Carson|Primary|Winner|...
...
RDD最多有300列/字段。
我的主要目标是将其转换为某些json,但我需要将每个值与一个键相关联,方法是将其加载到地图或类中。
var election = Map ("firstname" -> "Donald",
"lastname" -> "Trump",
"country" -> "US",
"event" -> "Election",
"period" -> "March"
"var1" -> "Spring",
....
"varN" -> "...")
答案 0 :(得分:1)
我不确定我是否理解正确,但这有帮助吗?
val data = List(
"Donald|Trump|US|Election|March",
"John|Smith|UK|Election|February"
)
val mapKeys = List("firstname", "lastname", "country", "event", "period")
val election = data.map { row =>
(mapKeys zip row.split("\\|").toList).map {
case (key, value) => key -> value
}.toMap
}
因此,您将获得一个地图列表 - 对于您的数据的每一行,您将获得所描述的键/值对映射。
答案 1 :(得分:0)
@slouc回答
有点干净stream.foreachRDD {_.map(_._2).map(l => (mapKeys zip l.split("\\|")).toMap).saveToEs(conf)}