我是sparkSQL的新手。谁能解决我的问题。
在“ E1EDP01”中有“ posex字段”。每个“ posex”中都有“ E1EDP02”。我希望从“ E1EDP02”中获得“ QUALF”值
E1EDP01.E1EDP02.QUALF
“ E1EDP01”:[
"@SEGMENT": "1",
"POSEX": "000010",
"MENGE": "4.000",
"MENEE": "EA",
"E1EDP02": [
{
"@SEGMENT": "1",
"QUALF": "016",
"BELNR": "0080001425",
"ZEILE": "000010",
}
]
{
"@SEGMENT": "1",
"POSEX": "000020",
"MENGE": "2.000",
"MENEE": "EA",
"E1EDP02": [
{
"@SEGMENT": "1",
"QUALF": "002",
"BELNR": "7000000986",
"ZEILE": "000020"
},
{
"@SEGMENT": "1",
"POSEX": "000030",
"MENGE": "2.000",
"MENEE": "EA",
E1EDP02": [
{
"@SEGMENT": "1",
"QUALF": "002",
"BELNR": "7000000986",
"ZEILE": "000020"
},
答案 0 :(得分:0)
您可以使用SparkSQL函数get_json_object()来提取嵌套字段,如下所示:
df = spark.createDataFrame(
[['{"E1EDP01": [{"POSEX": "000010", "E1EDP02": [{"QUALF": "016"}]}, {"POSEX": "000020", "E1EDP02": [{"QUALF": "002"}]}]}']],
['json_string']
)
df.selectExpr(
"get_json_object(json_string, '$.E1EDP01[*].E1EDP02[*].QUALF') as values"
).show()
# +-----------------+
# | values|
# +-----------------+
# |[["016"],["002"]]|
# +-----------------+