我的RDD类型为RDD [(String,List [String])]。
示例:
(FRUIT, List(Apple,Banana,Mango))
(VEGETABLE, List(Potato,Tomato))
我想将上面的输出转换为json对象,如下所示。
{
"categories": [
{
"name": "FRUIT",
"nodes": [
{
"name": "Apple",
"isInTopList": false
},
{
"name": "Banana",
"isInTopList": false
},
{
"name": "Mango",
"isInTopList": false
}
]
},
{
"name": "VEGETABLE",
"nodes": [
{
"name": "POTATO",
"isInTopList": false
},
{
"name": "TOMATO",
"isInTopList": false
},
]
}
]
}
请建议最好的方法。
注意:"isInTopList": false
始终是常量,并且必须与jsonobject中的每个项目一起存在。
答案 0 :(得分:5)
首先,我使用以下代码重现您提到的场景:
val sampleArray = Array(
("FRUIT", List("Apple", "Banana", "Mango")),
("VEGETABLE", List("Potato", "Tomato")))
val sampleRdd = sc.parallelize(sampleArray)
sampleRdd.foreach(println) // Printing the result
现在,我正在使用json4s Scala库将此RDD转换为您请求的JSON结构:
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL.WithDouble._
val json = "categories" -> sampleRdd.collect().toList.map{
case (name, nodes) =>
("name", name) ~
("nodes", nodes.map{
name => ("name", name)
})
}
println(compact(render(json))) // Printing the rendered JSON
结果是:
{"categories":[{"name":"FRUIT","nodes":[{"name":"Apple"},{"name":"Banana"},{"name":"Mango"}]},{"name":"VEGETABLE","nodes":[{"name":"Potato"},{"name":"Tomato"}]}]}
答案 1 :(得分:0)
由于您需要为整个RDD提供单个JSON,我首先要做Rdd.collect
。请注意您的设备适合内存,因为这会将数据移回驱动程序。
要获取json,只需使用库来遍历对象。我喜欢Json4s,因为它内部结构简单,操作简单实用。以下是他们网站上的一个示例,它展示了如何遍历嵌套结构(特别是列表):
object JsonExample extends App {
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
case class Winner(id: Long, numbers: List[Int])
case class Lotto(id: Long, winningNumbers: List[Int], winners: List[Winner], drawDate: Option[java.util.Date])
val winners = List(Winner(23, List(2, 45, 34, 23, 3, 5)), Winner(54, List(52, 3, 12, 11, 18, 22)))
val lotto = Lotto(5, List(2, 45, 34, 23, 7, 5, 3), winners, None)
val json =
("lotto" ->
("lotto-id" -> lotto.id) ~
("winning-numbers" -> lotto.winningNumbers) ~
("draw-date" -> lotto.drawDate.map(_.toString)) ~
("winners" ->
lotto.winners.map { w =>
(("winner-id" -> w.id) ~
("numbers" -> w.numbers))}))
println(compact(render(json)))
}