Question

我有一个熊猫格式的数据框，其格式如下：

# df name: cust_sim_data_product_agg:
yearmo  region  products    revenue
0   201711  CN  ['Auto', 'Flood', 'Home', 'Liability', 'Life',...   690
1   201711  CN  ['Auto', 'Flood', 'Home', 'Liability', 'Life']  610
2   201711  CN  ['Auto', 'Flood', 'Home', 'Liability']  560
3   201711  CN  ['Auto', 'Flood', 'Home', 'Life', 'Liability',...   690
4   201711  CN  ['Auto', 'Flood', 'Home', 'Life', 'Mortgage', ...   690

我想将其汇总为以下形式的嵌套json：

{
  yearmo: '201711'
  data: [
    {
      name: 'SE',
      value: 18090, # sum of all the values in the level below
      children: [
        {
          name: '['Auto', 'Flood', 'Home',...], # this is product from the dataframe
          value: 690 . # this is the revenue value
        },
        {
          name: '['Flood', 'Home', 'Life'...],
          value: 690
        },
        ...
      },
      {
      name: 'NE',
      value: 16500, # sum of all the values in the level below
      children: [
        {
          name: '['Auto', 'Home',...],
          value: 210
        },
        {
          name: '['Life'...],
          value: 450
        },
        ...
      }
    },
  yearmo: '201712'
  data: [
    {
      name: 'SE',
      value: 24050,
      children: [ ... ] # same format as above
    },
    {
      name: 'NE',
      value: 22400,
      children: [ ... ] # same format as above
    }
  ]
}

所以每个yearmo在json的顶层都有一个元素。在数据中，每个区域都会有一个条目，其中值是直接位于其下的级别的值的总和。子级是一组字典，其中每个字典都映射了熊猫DF中行级数据中的产品->名称和收入->值。

到目前为止，我最好的尝试是这样的：

def roll_yearmo_rev(d):
    x1 = [{'name': n, 'value': v}  for n,v in zip(d.products, d.revenue)]
    x2 = {'children': x1, 'value': sum(d.revenue)}
    return x2

def roll_yearmo(d):
    x1 = [{'name': n, 'children': c} for n,c in zip(d.region, d.children)]
    x2 = {'children': x1, 'value': sum(d.value)}
    return x2

cust_sim_data_product_agg_dict = cust_sim_data_product_agg.groupby(['yearmo', 'region'])\
    .apply(roll_yearmo_rev)
cust_sim_data_product_agg_dict = cust_sim_data_product_agg_dict.reset_index()
cust_sim_data_product_agg_dict.columns = ['yearmo' , 'region', 'children']


cust_sim_data_product_agg_dict = cust_sim_data_product_agg_dict.groupby(['yearmo'])\
    .apply(roll_yearmo)
cust_sim_data_product_agg_dict = cust_sim_data_product_agg_dict.reset_index()

哪个失败，因为上一次汇总会引发以下错误：

AttributeError: 'DataFrame' object has no attribute 'value'

整个事情对我来说都很混乱。我阅读了split-apply-combine，它启发了groupby（）和apply（）的使用，但是我真的可以对该方法使用第二种意见，因为我很确定有更好的方法。任何建议将不胜感激。

将Pandas DF转换为嵌套JSON

0 个答案: