目标:将pandas数据帧转换为聚合的json类对象。
" json-like" object包含每个Group和Category的值的聚合(总和)作为权重。
当前状态:
df = pd.DataFrame({'group': ["Group 1", "Group 1", "Group 2", "Group 3", "Group 3", "Group 3"],
'category': ["Category 1.1", "Category 1.2", "Category 2.1", "Category 3.1", "Category 3.2", "Category 3.3"],
'value': [2, 4, 5, 1, 4, 5]
})
结构:
>>> df[['group','category','value']]
group category value
0 Group 1 Category 1.1 2
1 Group 1 Category 1.2 4
2 Group 2 Category 2.1 5
3 Group 3 Category 3.1 1
4 Group 3 Category 3.2 4
5 Group 3 Category 3.3 5
期望的输出:
{"groups": [
{"label": "Group 1",
"weight": 6,
"groups": [
{"label": "Category 1.1",
"weight": 2,
"groups": [] },
{"label": "Category 1.2",
"weight": 4,
"groups": [] }
] },
{"label": "Group 2",
"weight": 5,
"groups": [{
"label": "Category 2.1",
"weight": 5,
"groups": []
} ] },
{"label": "Group 3",
"weight": 10,
"groups": [{
"label": "Category 3.1",
"weight": 1,
"groups": []
},
{"label": "Category 3.2",
"weight": 4,
"groups": []
},
{"label": "Category 3.3",
"weight": 5,
"groups": []
} ]
} ]
}
到目前为止已经尝试过:
pd.pivot_table(df, index=['group'],columns=['category'], values=['value'],aggfunc=np.sum, margins=True).stack('category')
枢轴输出:
value
group category
Group 1 All 6.0
Category 1.1 2.0
Category 1.2 4.0
Group 2 All 5.0
Category 2.1 5.0
Group 3 All 10.0
Category 3.1 1.0
Category 3.2 4.0
Category 3.3 5.0
All All 21.0
Category 1.1 2.0
Category 1.2 4.0
Category 2.1 5.0
Category 3.1 1.0
Category 3.2 4.0
Category 3.3 5.0
从那里我被困住了。汇总"全部"似乎应该在另一个专栏中,我不希望它作为一个"组"。我尝试使用to_json()
进行record
,values
,split
的各种迭代作为算法,但我无法弄清楚如何呈现所需的输出
还尝试df.groupby(['group','category']).agg({'value':'sum'})
,但我没有获得汇总金额。
类似的问题,但不是我所寻求的结构:
答案 0 :(得分:1)
我认为以下内容可能适合您。不能说它很漂亮......
import numpy as np
import pandas as pd
from itertools import chain
import json
df_grouped = df.groupby(['group', 'category'])['value'].sum().reset_index()
df_grouped = df_grouped.rename(columns={'value': 'weight', 'category': 'label'})
output_object = \
[{'label': k,
'weight': df_grouped.loc[v, 'weight'].sum(),
'groups': [dict({'groups': ()}.items() | x.items()) for x in
chain.from_iterable(df_grouped.iloc[v, :].groupby('label')[['label', 'weight']].\
apply(lambda x: x.to_dict(orient='records')).tolist())]}
for (k, v) in df_grouped.groupby(['group'])[['label', 'weight']].groups.items()]
output_dict = {'groups': output_object}
打印(output_dict)
{'groups': [{'groups': [{'groups': (), 'label': 'Category 2.1', 'weight': 5}],
'label': 'Group 2',
'weight': 5},
{'groups': [{'groups': (), 'label': 'Category 1.1', 'weight': 2},
{'groups': (), 'label': 'Category 1.2', 'weight': 4}],
'label': 'Group 1',
'weight': 6},
{'groups': [{'groups': (), 'label': 'Category 3.1', 'weight': 1},
{'groups': (), 'label': 'Category 3.2', 'weight': 4},
{'groups': (), 'label': 'Category 3.3', 'weight': 5}],
'label': 'Group 3',
'weight': 10}]}
为了实际以JSON形式获取它,我从this answer获取了解决方案:
def default(o):
if isinstance(o, np.integer): return int(o)
raise TypeError
output_json = json.dumps(output_json, default=default)
打印(output_json)
'{"groups": [{"groups": [{"groups": [], "weight": 5, "label": "Category 2.1"}], "weight": 5, "label": "Group 2"}, {"groups": [{"groups": [], "weight": 2, "label": "Category 1.1"}, {"groups": [], "weight": 4, "label": "Category 1.2"}], "weight": 6, "label": "Group 1"}, {"groups": [{"groups": [], "weight": 1, "label": "Category 3.1"}, {"groups": [], "weight": 4, "label": "Category 3.2"}, {"groups": [], "weight": 5, "label": "Category 3.3"}], "weight": 10, "label": "Group 3"}]}'