熊猫json规范化无法正常工作

时间:2020-07-14 12:40:11

标签: python json pandas csv

我正在使用pandas json规范化将json转换为pandas数据框。这是给我带来麻烦的json:

    {
   "P48PrecTer":{
      "@xmlns":"http://sujetos.esios.ree.es/schemas/2010/05/31/P48PrecTer-esios-MP/",
      "IdSesion":"16",
      "SeriesTemporales":{
         "IdentificacionSeriesTemporales":"STP0",
         "TipoNegocio":"A10",
         "UnidadPrecio":"EUR:MWH",
         "Periodo":[
            {
               "IntervaloTiempo":"2020-07-12T00:00Z/2020-07-12T01:00Z",
               "Resolucion":"PT60M",
               "Intervalo":{
                  "Pos":"1",
                  "PrecioBaj":"25.20"
               }
            },
            {
               "IntervaloTiempo":"2020-07-12T03:00Z/2020-07-12T11:00Z",
               "Resolucion":"PT60M",
               "Intervalo":[
                  {
                     "Pos":"1",
                     "PrecioSub":"27.36"
                  },
                  {
                     "Pos":"2",
                     "PrecioBaj":"23.50"
                  },
                  {
                     "Pos":"8",
                     "PrecioBaj":"16.90"
                  }
               ]
            },
            {
               "IntervaloTiempo":"2020-07-12T12:00Z/2020-07-12T16:00Z",
               "Resolucion":"PT60M",
               "Intervalo":[
                  {
                     "Pos":"1",
                     "PrecioSub":"29.90"
                  },
                  {
                     "Pos":"4",
                     "PrecioBaj":"15.75"
                  }
               ]
            }
         ]
      }
   }
}

我正在使用这段代码:

prueba=pd.json_normalize(body,record_path=['P48PrecTer','SeriesTemporales','Periodo','Intervalo'], meta=[['P48PrecTer','IdSesion'], ['P48PrecTer','SeriesTemporales','Periodo','IntervaloTiempo']])

我希望某些东西具有csv结构。但是,我得到了:

0|P48PrecTer.IdSesion|P48PrecTer.SeriesTemporales.Periodo.IntervaloTiempo
Pos|06|2020-07-12T22:00Z/2020-07-12T23:00Z
PrecioBaj|06|2020-07-12T22:00Z/2020-07-12T23:00Z
PrecioSub|06|2020-07-12T22:00Z/2020-07-12T23:00Z
{'Pos': '1', 'PrecioBaj': '4.41', 'PrecioSub': 'null'}|06|2020-07-13T00:00Z/2020-07-13T03:00Z
{'Pos': '2', 'PrecioBaj': '9.00', 'PrecioSub': 'null'}|06|2020-07-13T00:00Z/2020-07-13T03:00Z
{'Pos': '3', 'PrecioBaj': '10.10', 'PrecioSub': 'null'}|06|2020-07-13T00:00Z/2020-07-13T03:00Z

我猜测问题可能出在与“ Periodo”键相关联的列表上,因为我已经使用了其他具有类似结构的json并没有任何问题。 有什么办法可以达到我的目标吗? 谢谢。

2 个答案:

答案 0 :(得分:2)

是的,问题是字典中存在的列表,但是,我们可以通过以下方式解决它:

df = pd.json_normalize(data['P48PrecTer']['SeriesTemporales'],['Periodo'],errors='ignore')
df = df.explode('Intervalo')
df = pd.concat([df,df['Intervalo'].apply(pd.Series)],axis=1).drop(columns=[0,'Intervalo'])

for i in ['Pos','PrecioBaj']:
    df.loc[df[i].isna(),i] = df.loc[df[i].isna(),'Intervalo.'+i]
    df = df.drop(columns=['Intervalo.'+i])

print(df)

输出

                       IntervaloTiempo Resolucion Pos PrecioBaj PrecioSub
0  2020-07-12T00:00Z/2020-07-12T01:00Z      PT60M   1     25.20       NaN
1  2020-07-12T03:00Z/2020-07-12T11:00Z      PT60M   1       NaN     27.36
1  2020-07-12T03:00Z/2020-07-12T11:00Z      PT60M   2     23.50       NaN
1  2020-07-12T03:00Z/2020-07-12T11:00Z      PT60M   8     16.90       NaN
2  2020-07-12T12:00Z/2020-07-12T16:00Z      PT60M   1       NaN     29.90
2  2020-07-12T12:00Z/2020-07-12T16:00Z      PT60M   4     15.75       NaN

答案 1 :(得分:0)

问题是您在JSON中的数组中有一个数组,因此无法一次调用json_normalize。它必须处于循环中,然后您需要连接DataFrame,然后与外部DataFrame合并。

for i in data['P48PrecTer']['SeriesTemporales']['Periodo']:
    df = pd.json_normalize(i, record_path=['Intervalo'], meta=[['IntervaloTiempo'], ['Resolucion']])
    print(df)

           0                      IntervaloTiempo Resolucion
0        Pos  2020-07-12T00:00Z/2020-07-12T01:00Z      PT60M
1  PrecioBaj  2020-07-12T00:00Z/2020-07-12T01:00Z      PT60M
  Pos PrecioSub PrecioBaj                      IntervaloTiempo Resolucion
0   1     27.36       NaN  2020-07-12T03:00Z/2020-07-12T11:00Z      PT60M
1   2       NaN     23.50  2020-07-12T03:00Z/2020-07-12T11:00Z      PT60M
2   8       NaN     16.90  2020-07-12T03:00Z/2020-07-12T11:00Z      PT60M
  Pos PrecioSub PrecioBaj                      IntervaloTiempo Resolucion
0   1     29.90       NaN  2020-07-12T12:00Z/2020-07-12T16:00Z      PT60M
1   4       NaN     15.75  2020-07-12T12:00Z/2020-07-12T16:00Z      PT60M

一种更好的方法是使用flatten_json