我正在使用pandas json规范化将json转换为pandas数据框。这是给我带来麻烦的json:
{
"P48PrecTer":{
"@xmlns":"http://sujetos.esios.ree.es/schemas/2010/05/31/P48PrecTer-esios-MP/",
"IdSesion":"16",
"SeriesTemporales":{
"IdentificacionSeriesTemporales":"STP0",
"TipoNegocio":"A10",
"UnidadPrecio":"EUR:MWH",
"Periodo":[
{
"IntervaloTiempo":"2020-07-12T00:00Z/2020-07-12T01:00Z",
"Resolucion":"PT60M",
"Intervalo":{
"Pos":"1",
"PrecioBaj":"25.20"
}
},
{
"IntervaloTiempo":"2020-07-12T03:00Z/2020-07-12T11:00Z",
"Resolucion":"PT60M",
"Intervalo":[
{
"Pos":"1",
"PrecioSub":"27.36"
},
{
"Pos":"2",
"PrecioBaj":"23.50"
},
{
"Pos":"8",
"PrecioBaj":"16.90"
}
]
},
{
"IntervaloTiempo":"2020-07-12T12:00Z/2020-07-12T16:00Z",
"Resolucion":"PT60M",
"Intervalo":[
{
"Pos":"1",
"PrecioSub":"29.90"
},
{
"Pos":"4",
"PrecioBaj":"15.75"
}
]
}
]
}
}
}
我正在使用这段代码:
prueba=pd.json_normalize(body,record_path=['P48PrecTer','SeriesTemporales','Periodo','Intervalo'], meta=[['P48PrecTer','IdSesion'], ['P48PrecTer','SeriesTemporales','Periodo','IntervaloTiempo']])
我希望某些东西具有csv结构。但是,我得到了:
0|P48PrecTer.IdSesion|P48PrecTer.SeriesTemporales.Periodo.IntervaloTiempo
Pos|06|2020-07-12T22:00Z/2020-07-12T23:00Z
PrecioBaj|06|2020-07-12T22:00Z/2020-07-12T23:00Z
PrecioSub|06|2020-07-12T22:00Z/2020-07-12T23:00Z
{'Pos': '1', 'PrecioBaj': '4.41', 'PrecioSub': 'null'}|06|2020-07-13T00:00Z/2020-07-13T03:00Z
{'Pos': '2', 'PrecioBaj': '9.00', 'PrecioSub': 'null'}|06|2020-07-13T00:00Z/2020-07-13T03:00Z
{'Pos': '3', 'PrecioBaj': '10.10', 'PrecioSub': 'null'}|06|2020-07-13T00:00Z/2020-07-13T03:00Z
我猜测问题可能出在与“ Periodo”键相关联的列表上,因为我已经使用了其他具有类似结构的json并没有任何问题。 有什么办法可以达到我的目标吗? 谢谢。
答案 0 :(得分:2)
是的,问题是字典中存在的列表,但是,我们可以通过以下方式解决它:
df = pd.json_normalize(data['P48PrecTer']['SeriesTemporales'],['Periodo'],errors='ignore')
df = df.explode('Intervalo')
df = pd.concat([df,df['Intervalo'].apply(pd.Series)],axis=1).drop(columns=[0,'Intervalo'])
for i in ['Pos','PrecioBaj']:
df.loc[df[i].isna(),i] = df.loc[df[i].isna(),'Intervalo.'+i]
df = df.drop(columns=['Intervalo.'+i])
print(df)
输出
IntervaloTiempo Resolucion Pos PrecioBaj PrecioSub
0 2020-07-12T00:00Z/2020-07-12T01:00Z PT60M 1 25.20 NaN
1 2020-07-12T03:00Z/2020-07-12T11:00Z PT60M 1 NaN 27.36
1 2020-07-12T03:00Z/2020-07-12T11:00Z PT60M 2 23.50 NaN
1 2020-07-12T03:00Z/2020-07-12T11:00Z PT60M 8 16.90 NaN
2 2020-07-12T12:00Z/2020-07-12T16:00Z PT60M 1 NaN 29.90
2 2020-07-12T12:00Z/2020-07-12T16:00Z PT60M 4 15.75 NaN
答案 1 :(得分:0)
问题是您在JSON中的数组中有一个数组,因此无法一次调用json_normalize
。它必须处于循环中,然后您需要连接DataFrame,然后与外部DataFrame合并。
for i in data['P48PrecTer']['SeriesTemporales']['Periodo']:
df = pd.json_normalize(i, record_path=['Intervalo'], meta=[['IntervaloTiempo'], ['Resolucion']])
print(df)
0 IntervaloTiempo Resolucion
0 Pos 2020-07-12T00:00Z/2020-07-12T01:00Z PT60M
1 PrecioBaj 2020-07-12T00:00Z/2020-07-12T01:00Z PT60M
Pos PrecioSub PrecioBaj IntervaloTiempo Resolucion
0 1 27.36 NaN 2020-07-12T03:00Z/2020-07-12T11:00Z PT60M
1 2 NaN 23.50 2020-07-12T03:00Z/2020-07-12T11:00Z PT60M
2 8 NaN 16.90 2020-07-12T03:00Z/2020-07-12T11:00Z PT60M
Pos PrecioSub PrecioBaj IntervaloTiempo Resolucion
0 1 29.90 NaN 2020-07-12T12:00Z/2020-07-12T16:00Z PT60M
1 4 NaN 15.75 2020-07-12T12:00Z/2020-07-12T16:00Z PT60M
一种更好的方法是使用flatten_json