如何使用Spark Scala解析Hive / Hbase列中可用的嵌套JSON

时间:2019-04-17 04:49:22

标签: json scala apache-spark

如何使用Spark Scala解析和展平Hive / Hbase列中可用的嵌套JSON?

示例:

A hive table is having a column "c1" with following json

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "Lines": [{
            "LineNumber": 1,
            "Text": "ABC"
        },
        {
            "LineNumber": 2,
            "Text": "123"
        }
     ]
}

I want to parse this json and create a dataframe to contain columns and values like this
+------+------+-------+------------+------+
|fruit | size | color | LineNumber | Text |
+------+------+-------+------------+------+
|Apple | Large| Red   | 1          | ABC  |
|Apple | Large| Red   | 2          | 123  |
+------+------+-------+------------+------+

赞赏任何想法。谢谢!

3 个答案:

答案 0 :(得分:0)

使用mkstring将json转换为String,然后使用以下代码

val otherFruitRddRDD = spark.sparkContext.makeRDD( “”“ {” Fruitname“:” Jack“,” fruitDetails“:{” fruit“:” Apple“,” size“:” Large“}}”“” ::无)

val otherFruit = spark.read.json(otherFruitRddRDD)

otherFruit.show()

答案 1 :(得分:0)

 val df = spark.read.json("example.json")

您可以在以下链接中找到详细的示例

答案 2 :(得分:0)

我认为您需要这样的方法:

df.select(from_json($"c1", schema))

模式将为“结构类型”,并将包含 json会变成a。水果b.size c.color