在火花数据帧中读取的多级JSON(嵌套JSON)

时间:2019-10-29 17:05:28

标签: json dataframe apache-spark nested multi-level

如何使用Spark中的数据帧从嵌套JSON中选择数据。

从下面的示例JSON中,我想从Array中的Array中选择数据

“ xyz01”:[         {           “ @SEGMENT”:“ 1”,           “ POS”:“ 00001”,           “ MEN”:“ 10.000”,           “ xyz05”:[             {
              “ CHL”:“ DIRECT”,               “ BET”:“ 54545”             },             {               “ @SEGMENT”:“ 1”,               “ CHL”:“ INDIRECT”,               “ TRG”:“ 778787”,              }             ]         },         {           “ @SEGMENT”:“ 1”,           “ POS”:“ 00002”,           “ MENGE”:“ 4354354”,           “ xyz05”:[             {               “ @SEGMENT”:“ 1”,               “ ALCKZ”:“ +”,               “ CHL”:“ DIRECT”,             },             {               “ @SEGMENT”:“ 1”,               “ CHL”:“ INDIRECT”,               “ TRG”:“ 3434343”             }           ]          };
         ]

必需的输出:   POS CHL  000001直接  000001间接  000002直接  000002间接

我正在尝试下面的代码...。但是在输出中得到重复的值。 代码:

DF = "READ THE JOSON FILE INTO DataFrame"
DF.withColumn("LineItem",explode(col("XYZ01.POS")))
                          .withColumn("TypeCode",explode(col("XYZ01.XYZ05")))
                                 .select(explode(col("TypeCode.CHL")).as("TypeCodeOutPut"),col"LineItem"))

上述代码的输出: 000001直接 000001间接 000001直接 000001间接 000002直接 000002间接 000002直接 000002间接

0 个答案:

没有答案