熊猫分组数据框并创建嵌套的json

时间:2019-08-23 20:19:56

标签: python json pandas

我有以下数据框(我已经简化了)

                  Column0 Column1             Type
Asset Code
R0083TX3P3PATX999    0.00    0.00  variable_name_1
R0084TX3P3WTXNM99   55.74   55.74  variable_name_1
R0087KY2P2KY99999  265.35  265.35  variable_name_1
T7001OK2P2OK99999    0.00    0.00  variable_name_2
T7029LA3P3SLA9999    0.00    0.00  variable_name_2
T7032CA5P5SW99999    0.00    0.00  variable_name_2
T7001OK2P2OK99999    0.00    0.00  variable_name_3
T7029LA3P3SLA9999    9.00    9.00  variable_name_3
T7032CA5P5SW99999   14.00   14.00  variable_name_3

实际上,在添加“类型”列后,我已经将3个不同的数据框连接在一起,这样我就可以知道它来自何处

我的最终目标是创建一个嵌套的json,在其顶部以“ ColumnName”作为键,然后在第二个以“ Asset Code”作为下一个键,该方法具有第三个嵌套对象,该对象只是关联的变量和值。

这样做的目的是可以通过以下方式访问json: data['Column0']['Asset-Code']和值列表将会出现

输出将是具有以下格式的json对象:

{
  "Column1": {
    "R0083TX3P3PATX999": {
      "variable_name_1": 0
    },
    "R0084TX3P3WTXNM99": {
      "variable_name_1": 55.74
    },
    "R0087KY2P2KY99999": {
      "variable_name_1": 265.35
    },
    "T7001OK2P2OK99999": {
      "variable_name_2": 0,
      "variable_name_3": 0
    },
    "T7029LA3P3SLA9999": {
      "variable_name_2": 0,
      "variable_name_3": 9.0
    },
    "T7032CA5P5SW99999": {
      "variable_name_2": 0,
      "variable_name_3": 14
    }
  },
  "Column2": {
    "R0083TX3P3PATX999": {
      "variable_name_1": 0
    },
    "R0084TX3P3WTXNM99": {
      "variable_name_1": 55.74
    },
    "R0087KY2P2KY99999": {
      "variable_name_1": 265.35
    },
    "T7001OK2P2OK99999": {
      "variable_name_2": 2,
      "variable_name_3": 3
    },
    "T7029LA3P3SLA9999": {
      "variable_name_2": 2,
      "variable_name_3": 9.0
    },
    "T7032CA5P5SW99999": {
      "variable_name_2": 0,
      "variable_name_3": 14
    }
  }
}

我不确定如何执行此操作,这是否意味着包含所有功能的新主数据帧需要重新索引(或进行多索引?),我也在查看groupby函数,但不确定如何应用它们,特别是因为那个第三个嵌套对象。最初,这非常容易,因为我只是将to_jsonorient=columns一起导出,并且效果很好,但仅适用于两个级别的数据。

1 个答案:

答案 0 :(得分:2)

您可以通过在列之间循环并转换为index的方式来实现此目的:

res = {col: df.pivot('Asset Code','Type', col)\
                              .replace({np.nan:None})\
                              .to_dict(orient='index') for col in 'Column1','Column2']}

输出

{'Column1': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0087KY2P2KY99999': {'variable_name_1': 265.35,
   'variable_name_2': None,
   'variable_name_3': None},
  'T7001OK2P2OK99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 0.0},
  'T7029LA3P3SLA9999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 9.0},
  'T7032CA5P5SW99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 14.0}},
 'Column2': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0087KY2P2KY99999': {'variable_name_1': 265.35,
   'variable_name_2': None,
   'variable_name_3': None},
  'T7001OK2P2OK99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 0.0},
  'T7029LA3P3SLA9999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 9.0},
  'T7032CA5P5SW99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 14.0}}}

在我用来创建df的代码下面:

df = pd.DataFrame([
    ['R0083TX3P3PATX999',    0.00,    0.00,  'variable_name_1'],
    ['R0084TX3P3WTXNM99',   55.74 ,  55.74,  'variable_name_1'],
    ['R0087KY2P2KY99999',  265.35 , 265.35,  'variable_name_1'],
    ['T7001OK2P2OK99999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7029LA3P3SLA9999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7032CA5P5SW99999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7001OK2P2OK99999',    0.00 ,   0.00,  'variable_name_3'],
    ['T7029LA3P3SLA9999',    9.00 ,   9.00,  'variable_name_3'],
    ['T7032CA5P5SW99999',   14.00 ,  14.00,  'variable_name_3'],
], columns = ['Asset Code','Column1','Column2', 'Type'])