如何在Python中将CSV转换为嵌套的JSON(最高级别3)

时间:2016-05-26 05:42:48

标签: python json csv pandas

我有CSV格式的以下数据。

id,category,sub_category,sub_category_type,count
0,fruits,citrus,lemon,30
1,fruits,citrus,lemon,40
2,fruits,citrus,lemon,50
3,fruits,citrus,grapefruit,20
4,fruits,citrus,orange,40
5,fruits,citrus,orange,10
6,fruits,berries,blueberry,20
7,fruits,berries,strawberry,50
8,fruits,berries,strawberry,90
9,fruits,berries,cranberry,70
10,fruits,berries,raspberry,16
11,fruits,berries,raspberry,80
12,fruits,dried fruit,raisins,10
13,fruits,dried fruit,dates,15
14,fruits,dried fruit,dates,10
15,vegetables,legumes,beans,12
16,vegetables,legumes,beans,15
17,vegetables,legumes,chickpea,12
18,vegetables,green leaf,spinach,18
19,vegetables,green leaf,cress,19

我想将上面的CSV格式转换为嵌套的JSON,因为pandas.DataFrame.to_json()无法帮助我转换为嵌套的JSON格式。

有没有解决方案?

PS:我正在以Q& A风格回答上述问题,以分享知识。我很高兴知道是否有更好的解决方案。

1 个答案:

答案 0 :(得分:1)

以下代码的灵感来自this github链接。此代码将帮助我们将CSV转换为3级嵌套JSON

import pandas as pd
import json


df = pd.read_csv('data.csv')

# choose columns to keep, in the desired nested json hierarchical order
df = df[["category", "sub_category","sub_category_type", "count"]]

# order in the groupby here matters, it determines the json nesting
# the groupby call makes a pandas series by grouping "category", "sub_category" and"sub_category_type", 
#while summing the numerical column 'count'
df1 = df.groupby(["category", "sub_category","sub_category_type"])['count'].sum()
df1 = df1.reset_index()

print df1

d = dict()
d = {"name":"stock", "children": []}

for line in df1.values:
    category = line[0]
    sub_category = line[1]
    sub_category_type = line[2]
    count = line[3]

    # make a list of keys
    category_list = []
    for item in d['children']:
        category_list.append(item['name'])

    # if 'category' is NOT category_list, append it
    if not category in category_list:
        d['children'].append({"name":category, "children":[{"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]}]})

    # if 'category' IS in category_list, add a new child to it
    else:
        sub_list = []        
        for item in d['children'][category_list.index(category)]['children']:
            sub_list.append(item['name'])
        print sub_list

        if not sub_category in sub_list:
            d['children'][category_list.index(category)]['children'].append({"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]})
        else:
            d['children'][category_list.index(category)]['children'][sub_list.index(sub_category)]['children'].append({"name": sub_category_type, "count" : count})


print json.dumps(d)

执行时,

{
"name": "stock", 
"children": [
    {"name": "fruits",
    "children": [
        {"name": "berries", 
        "children": [
            {"count": 20, "name": "blueberry"}, 
            {"count": 70, "name": "cranberry"}, 
            {"count": 96, "name": "raspberry"}, 
            {"count": 140, "name": "strawberry"}]
        },
        {"name": "citrus", 
        "children": [
            {"count": 20, "name": "grapefruit"},
            {"count": 120, "name": "lemon"},
            {"count": 50, "name": "orange"}]
        }, 
        {"name": "dried fruit",
        "children": [
            {"count": 25, "name": "dates"}, 
            {"count": 10, "name": "raisins"}]
        }]
    },
    {"name": "vegtables",
    "children": [
        {"name": "green leaf",
        "children": [
            {"count": 19, "name": "cress"},
            {"count": 18, "name": "spinach"}]
        },
        {
        "name": "legumes",
        "children": [
            {"count": 27, "name": "beans"},
            {"count": 12, "name": "chickpea"}]
        }]
    }]
}