如何将pandas Series转换为所需的JSON格式?

时间:2016-05-23 17:35:55

标签: json python-2.7 pandas data-cleansing to-json

我有以下数据,我需要做的是应用聚合函数,然后是groupby。

我的数据如下:data.csv

id,category,sub_category,count
0,x,sub1,10
1,x,sub2,20
2,x,sub2,10
3,y,sub3,30
4,y,sub3,5
5,y,sub4,15
6,z,sub5,20

在这里,我试图通过子类别明智地计算。之后,我需要以JSON格式存储结果。以下代码帮助我实现了这一目标。 test.py

import pandas as pd
df = pd.read_csv('data.csv')
sub_category_total = df['count'].groupby([df['category'], df['sub_category']]).sum()
print sub_category_total.reset_index().to_json(orient = "records")

上面的代码给出了以下格式。

[{"category":"x","sub_category":"sub1","count":10},{"category":"x","sub_category":"sub2","count":30},{"category":"y","sub_category":"sub3","count":35},{"category":"y","sub_category":"sub4","count":15},{"category":"z","sub_category":"sub5","count":20}]

但是,我想要的格式如下:

{
"x":[{
     "sub_category":"sub1",
     "count":10
     },
     {
     "sub_category":"sub2",
      "count":30}],
"y":[{
     "sub_category":"sub3",
     "count":35
     },
     {
     "sub_category":"sub4",
     "count":15}],
"z":[{
     "sub_category":"sub5",
      "count":20}]
}

通过跟踪@ How to convert pandas DataFrame result to user defined json format的讨论,我将test.py的最后两行替换为,

g = df.groupby('category')[["sub_category","count"]].apply(lambda x: x.to_dict(orient='records'))
print g.to_json()

它给了我以下输出。

{"x":[{"count":10,"sub_category":"sub1"},{"count":20,"sub_category":"sub2"},{"count":10,"sub_category":"sub2"}],"y":[{"count":30,"sub_category":"sub3"},{"count":5,"sub_category":"sub3"},{"count":15,"sub_category":"sub4"}],"z":[{"count":20,"sub_category":"sub5"}]}

虽然上面的结果有点类似于我想要的格式,但我无法在这里执行任何聚合功能,因为它会抛出错误'numpy.int64' object has no attribute 'to_dict'。因此,我最终得到了数据文件中的所有行。

有人可以帮助我实现上述JSON格式吗?

1 个答案:

答案 0 :(得分:5)

我认为您可以先与sum汇总,参数as_index=False已添加到groupby,因此输出为Dataframe df1,然后使用{{3 }}:

df1 = (df.groupby(['category','sub_category'], as_index=False)['count'].sum())
print (df1)
  category sub_category  count
0        x         sub1     10
1        x         sub2     30
2        y         sub3     35
3        y         sub4     15
4        z         sub5     20

g = df1.groupby('category')[["sub_category","count"]]
       .apply(lambda x: x.to_dict(orient='records'))

print (g.to_json())
{
    "x": [{
        "sub_category": "sub1",
        "count": 10
    }, {
        "sub_category": "sub2",
        "count": 30
    }],
    "y": [{
        "sub_category": "sub3",
        "count": 35
    }, {
        "sub_category": "sub4",
        "count": 15
    }],
    "z": [{
        "sub_category": "sub5",
        "count": 20
    }]
}