如何使用Spark中的数据帧从嵌套JSON中选择数据。
从下面的示例JSON中,我想从Array中的Array中选择数据
“ xyz01”:[
{
“ @SEGMENT”:“ 1”,
“ POS”:“ 00001”,
“ MEN”:“ 10.000”,
“ xyz05”:[
{
“ CHL”:“ DIRECT”,
“ BET”:“ 54545”
},
{
“ @SEGMENT”:“ 1”,
“ CHL”:“ INDIRECT”,
“ TRG”:“ 778787”,
}
]
},
{
“ @SEGMENT”:“ 1”,
“ POS”:“ 00002”,
“ MENGE”:“ 4354354”,
“ xyz05”:[
{
“ @SEGMENT”:“ 1”,
“ ALCKZ”:“ +”,
“ CHL”:“ DIRECT”,
},
{
“ @SEGMENT”:“ 1”,
“ CHL”:“ INDIRECT”,
“ TRG”:“ 3434343”
}
]
};
]
必需的输出: POS CHL 000001直接 000001间接 000002直接 000002间接
我正在尝试下面的代码...。但是在输出中得到重复的值。 代码:
DF = "READ THE JOSON FILE INTO DataFrame"
DF.withColumn("LineItem",explode(col("XYZ01.POS")))
.withColumn("TypeCode",explode(col("XYZ01.XYZ05")))
.select(explode(col("TypeCode.CHL")).as("TypeCodeOutPut"),col"LineItem"))
上述代码的输出: 000001直接 000001间接 000001直接 000001间接 000002直接 000002间接 000002直接 000002间接