从复杂的嵌套json获取值

时间:2019-02-13 05:41:17

标签: apache-spark-sql

我是sparkSQL的新手。谁能解决我的问题。

在“ E1EDP01”中有“ posex字段”。每个“ posex”中都有“ E1EDP02”。我希望从“ E1EDP02”中获得“ QUALF”值

E1EDP01.E1EDP02.QUALF

“ E1EDP01”:[

                "@SEGMENT": "1",
                "POSEX": "000010",
                "MENGE": "4.000",
                "MENEE": "EA",                       
                "E1EDP02": [

                    {
                        "@SEGMENT": "1",
                        "QUALF": "016",
                        "BELNR": "0080001425",
                        "ZEILE": "000010",
                        }
                ]
            {   
                "@SEGMENT": "1",
                "POSEX": "000020",
                "MENGE": "2.000",
                "MENEE": "EA",

                "E1EDP02": [
                    {
                        "@SEGMENT": "1",
                        "QUALF": "002",
                        "BELNR": "7000000986",
                        "ZEILE": "000020"
                    },
            {
                "@SEGMENT": "1",
                "POSEX": "000030",
                "MENGE": "2.000",
                "MENEE": "EA",

                 E1EDP02": [

                    {
                        "@SEGMENT": "1",
                        "QUALF": "002",
                        "BELNR": "7000000986",
                        "ZEILE": "000020"
                    },

1 个答案:

答案 0 :(得分:0)

您可以使用SparkSQL函数get_json_object()来提取嵌套字段,如下所示:

df = spark.createDataFrame(
  [['{"E1EDP01": [{"POSEX": "000010", "E1EDP02": [{"QUALF": "016"}]}, {"POSEX": "000020", "E1EDP02": [{"QUALF": "002"}]}]}']],
  ['json_string']
)

df.selectExpr(
  "get_json_object(json_string, '$.E1EDP01[*].E1EDP02[*].QUALF') as values"
).show()

# +-----------------+
# |           values|
# +-----------------+
# |[["016"],["002"]]|
# +-----------------+