Question

如何使用Spark Scala解析和展平Hive / Hbase列中可用的嵌套JSON？

示例：

A hive table is having a column "c1" with following json

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "Lines": [{
            "LineNumber": 1,
            "Text": "ABC"
        },
        {
            "LineNumber": 2,
            "Text": "123"
        }
     ]
}

I want to parse this json and create a dataframe to contain columns and values like this
+------+------+-------+------------+------+
|fruit | size | color | LineNumber | Text |
+------+------+-------+------------+------+
|Apple | Large| Red   | 1          | ABC  |
|Apple | Large| Red   | 2          | 123  |
+------+------+-------+------------+------+

赞赏任何想法。谢谢！

Answer 1

使用mkstring将json转换为String，然后使用以下代码

val otherFruitRddRDD = spark.sparkContext.makeRDD（ “”“ {” Fruitname“：” Jack“，” fruitDetails“：{” fruit“：” Apple“，” size“：” Large“}}”“” ::无）

val otherFruit = spark.read.json（otherFruitRddRDD）

otherFruit.show（）

Answer 2

 val df = spark.read.json("example.json")

您可以在以下链接中找到详细的示例

https://docs.databricks.com/spark/latest/data-sources/read-json.html

Answer 3

我认为您需要这样的方法：

df.select(from_json($"c1", schema))

模式将为“结构类型”，并将包含 json会变成a。水果b.size c.color

如何使用Spark Scala解析Hive / Hbase列中可用的嵌套JSON

3 个答案: