如何使用Spark Scala解析和展平Hive / Hbase列中可用的嵌套JSON?
示例:
A hive table is having a column "c1" with following json
{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"Lines": [{
"LineNumber": 1,
"Text": "ABC"
},
{
"LineNumber": 2,
"Text": "123"
}
]
}
I want to parse this json and create a dataframe to contain columns and values like this
+------+------+-------+------------+------+
|fruit | size | color | LineNumber | Text |
+------+------+-------+------------+------+
|Apple | Large| Red | 1 | ABC |
|Apple | Large| Red | 2 | 123 |
+------+------+-------+------------+------+
赞赏任何想法。谢谢!
答案 0 :(得分:0)
使用mkstring将json转换为String,然后使用以下代码
val otherFruitRddRDD = spark.sparkContext.makeRDD( “”“ {” Fruitname“:” Jack“,” fruitDetails“:{” fruit“:” Apple“,” size“:” Large“}}”“” ::无)
val otherFruit = spark.read.json(otherFruitRddRDD)
otherFruit.show()
答案 1 :(得分:0)
val df = spark.read.json("example.json")
您可以在以下链接中找到详细的示例
答案 2 :(得分:0)
我认为您需要这样的方法:
df.select(from_json($"c1", schema))
模式将为“结构类型”,并将包含 json会变成a。水果b.size c.color