将pandas数据帧转换为聚合的嵌套json结构

时间:2018-03-29 03:23:30

标签: python pandas dataframe data-structures

  

目标:将pandas数据帧转换为聚合的json类对象。

" json-like" object包含每个Group和Category的值的聚合(总和)作为权重。

当前状态:

df = pd.DataFrame({'group': ["Group 1", "Group 1", "Group 2", "Group 3", "Group 3", "Group 3"], 
                   'category': ["Category 1.1", "Category 1.2", "Category 2.1", "Category 3.1", "Category 3.2", "Category 3.3"],
                   'value': [2, 4, 5, 1, 4, 5]
                   })

结构:

>>> df[['group','category','value']]
     group      category  value
0  Group 1  Category 1.1      2
1  Group 1  Category 1.2      4
2  Group 2  Category 2.1      5
3  Group 3  Category 3.1      1
4  Group 3  Category 3.2      4
5  Group 3  Category 3.3      5

期望的输出:

{"groups": [
    {"label": "Group 1",
      "weight": 6,
      "groups": [
        {"label": "Category 1.1",
          "weight": 2,
          "groups": [] },
        {"label": "Category 1.2",
          "weight": 4,
          "groups": [] }
      ] },
    {"label": "Group 2",
      "weight": 5,
      "groups": [{
          "label": "Category 2.1",
          "weight": 5,
          "groups": []
        } ] },
    {"label": "Group 3",
      "weight": 10,
      "groups": [{
          "label": "Category 3.1",
          "weight": 1,
          "groups": []
        },
        {"label": "Category 3.2",
          "weight": 4,
          "groups": []
        },
        {"label": "Category 3.3",
          "weight": 5,
          "groups": []
        } ]
    } ]
}

到目前为止已经尝试过:

pd.pivot_table(df, index=['group'],columns=['category'], values=['value'],aggfunc=np.sum, margins=True).stack('category')

枢轴输出:

                      value
group   category           
Group 1 All             6.0
        Category 1.1    2.0
        Category 1.2    4.0
Group 2 All             5.0
        Category 2.1    5.0
Group 3 All            10.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0
All     All            21.0
        Category 1.1    2.0
        Category 1.2    4.0
        Category 2.1    5.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0

从那里我被困住了。汇总"全部"似乎应该在另一个专栏中,我不希望它作为一个"组"。我尝试使用to_json()进行recordvaluessplit的各种迭代作为算法,但我无法弄清楚如何呈现所需的输出

还尝试df.groupby(['group','category']).agg({'value':'sum'}),但我没有获得汇总金额。

类似的问题,但不是我所寻求的结构:

1 个答案:

答案 0 :(得分:1)

我认为以下内容可能适合您。不能说它很漂亮......

import numpy as np
import pandas as pd
from itertools import chain
import json

df_grouped = df.groupby(['group', 'category'])['value'].sum().reset_index()
df_grouped = df_grouped.rename(columns={'value': 'weight', 'category': 'label'})

output_object = \
    [{'label': k, 
      'weight': df_grouped.loc[v, 'weight'].sum(),
      'groups': [dict({'groups': ()}.items() | x.items()) for x in 
                 chain.from_iterable(df_grouped.iloc[v, :].groupby('label')[['label', 'weight']].\
                  apply(lambda x: x.to_dict(orient='records')).tolist())]}
      for (k, v) in df_grouped.groupby(['group'])[['label', 'weight']].groups.items()]

output_dict = {'groups': output_object}
  

打印(output_dict)

{'groups': [{'groups': [{'groups': (), 'label': 'Category 2.1', 'weight': 5}],
   'label': 'Group 2',
   'weight': 5},
  {'groups': [{'groups': (), 'label': 'Category 1.1', 'weight': 2},
    {'groups': (), 'label': 'Category 1.2', 'weight': 4}],
   'label': 'Group 1',
   'weight': 6},
  {'groups': [{'groups': (), 'label': 'Category 3.1', 'weight': 1},
    {'groups': (), 'label': 'Category 3.2', 'weight': 4},
    {'groups': (), 'label': 'Category 3.3', 'weight': 5}],
   'label': 'Group 3',
   'weight': 10}]}

为了实际以JSON形式获取它,我从this answer获取了解决方案:

def default(o):
    if isinstance(o, np.integer): return int(o)
    raise TypeError

output_json = json.dumps(output_json, default=default)
  

打印(output_json)

'{"groups": [{"groups": [{"groups": [], "weight": 5, "label": "Category 2.1"}], "weight": 5, "label": "Group 2"}, {"groups": [{"groups": [], "weight": 2, "label": "Category 1.1"}, {"groups": [], "weight": 4, "label": "Category 1.2"}], "weight": 6, "label": "Group 1"}, {"groups": [{"groups": [], "weight": 1, "label": "Category 3.1"}, {"groups": [], "weight": 4, "label": "Category 3.2"}, {"groups": [], "weight": 5, "label": "Category 3.3"}], "weight": 10, "label": "Group 3"}]}'