我有CSV格式的以下数据。
id,category,sub_category,sub_category_type,count
0,fruits,citrus,lemon,30
1,fruits,citrus,lemon,40
2,fruits,citrus,lemon,50
3,fruits,citrus,grapefruit,20
4,fruits,citrus,orange,40
5,fruits,citrus,orange,10
6,fruits,berries,blueberry,20
7,fruits,berries,strawberry,50
8,fruits,berries,strawberry,90
9,fruits,berries,cranberry,70
10,fruits,berries,raspberry,16
11,fruits,berries,raspberry,80
12,fruits,dried fruit,raisins,10
13,fruits,dried fruit,dates,15
14,fruits,dried fruit,dates,10
15,vegetables,legumes,beans,12
16,vegetables,legumes,beans,15
17,vegetables,legumes,chickpea,12
18,vegetables,green leaf,spinach,18
19,vegetables,green leaf,cress,19
我想将上面的CSV格式转换为嵌套的JSON,因为pandas.DataFrame.to_json()无法帮助我转换为嵌套的JSON格式。
有没有解决方案?
PS:我正在以Q& A风格回答上述问题,以分享知识。我很高兴知道是否有更好的解决方案。答案 0 :(得分:1)
以下代码的灵感来自this github链接。此代码将帮助我们将CSV转换为3级嵌套JSON
import pandas as pd
import json
df = pd.read_csv('data.csv')
# choose columns to keep, in the desired nested json hierarchical order
df = df[["category", "sub_category","sub_category_type", "count"]]
# order in the groupby here matters, it determines the json nesting
# the groupby call makes a pandas series by grouping "category", "sub_category" and"sub_category_type",
#while summing the numerical column 'count'
df1 = df.groupby(["category", "sub_category","sub_category_type"])['count'].sum()
df1 = df1.reset_index()
print df1
d = dict()
d = {"name":"stock", "children": []}
for line in df1.values:
category = line[0]
sub_category = line[1]
sub_category_type = line[2]
count = line[3]
# make a list of keys
category_list = []
for item in d['children']:
category_list.append(item['name'])
# if 'category' is NOT category_list, append it
if not category in category_list:
d['children'].append({"name":category, "children":[{"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]}]})
# if 'category' IS in category_list, add a new child to it
else:
sub_list = []
for item in d['children'][category_list.index(category)]['children']:
sub_list.append(item['name'])
print sub_list
if not sub_category in sub_list:
d['children'][category_list.index(category)]['children'].append({"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]})
else:
d['children'][category_list.index(category)]['children'][sub_list.index(sub_category)]['children'].append({"name": sub_category_type, "count" : count})
print json.dumps(d)
执行时,
{
"name": "stock",
"children": [
{"name": "fruits",
"children": [
{"name": "berries",
"children": [
{"count": 20, "name": "blueberry"},
{"count": 70, "name": "cranberry"},
{"count": 96, "name": "raspberry"},
{"count": 140, "name": "strawberry"}]
},
{"name": "citrus",
"children": [
{"count": 20, "name": "grapefruit"},
{"count": 120, "name": "lemon"},
{"count": 50, "name": "orange"}]
},
{"name": "dried fruit",
"children": [
{"count": 25, "name": "dates"},
{"count": 10, "name": "raisins"}]
}]
},
{"name": "vegtables",
"children": [
{"name": "green leaf",
"children": [
{"count": 19, "name": "cress"},
{"count": 18, "name": "spinach"}]
},
{
"name": "legumes",
"children": [
{"count": 27, "name": "beans"},
{"count": 12, "name": "chickpea"}]
}]
}]
}