Python:将CSV文件的多列转换为嵌套的Json

时间:2018-08-01 19:23:00

标签: python json csv nested multiple-columns

这是我输入的具有多列的CSV文件,我想将此csv文件转换为具有department,departmentID和一个称为客户的嵌套字段的json文件,并首先嵌套最后一个嵌套到此字段。

department, departmentID, first, last
fans, 1, Caroline, Smith
fans, 1, Jenny, White
students, 2, Ben, CJ
students, 2, Joan, Carpenter
...

输出我需要的json文件:

[
{
"department" : "fans",
"departmentID: "1",
"customer" : [
    {
      "first" : "Caroline",
      "last" :  "Smith"
    },
    {
      "first" : "Jenny",
      "last" :  "White"
    }
    ]
},
{
"department" : "students", 
"departmentID":2,
"user" : 
     [
     {
      "first" : "Ben",
      "last" :  "CJ"
    },
    {
     "first" : "Joan",
      "last" :  "Carpenter"
    }
  ]
}
]

我的代码:

from csv import DictReader
from itertools import groupby
with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['group'], r['groupID'])):
        groups.append({
            "group": k[0],
            "groupID": k[1],
            "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
        })
        uniquekeys.append(k)

pprint(groups)

我的问题是:groupID在数据中显示两次,进出嵌套json。我想要的是group和groupID作为grouby键。

1 个答案:

答案 0 :(得分:0)

问题是您混合了按键的名称,所以这一行 "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]  没有从字典中正确删除它们,没有这样的密钥。所以什么也没有删除。

我不完全了解您想要什么键,因此下面的示例假定data.csv看起来像您的问题departmentdepartmentID一样,但是脚本将其转换为{{1} }和group

groupID

输出:

from csv import DictReader
from itertools import groupby
from pprint import pprint

with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['department'], r['departmentID'])):
        groups.append({
            "group": k[0],
            "groupID": k[1],
            "user": [{k:v for k, v in d.items() if k not in ['department','departmentID']} for d in list(g)]
        })
        uniquekeys.append(k)

pprint(groups)

我使用了不同的键,因此很明显,哪一行在做什么,并且很容易针对输入或输出中的不同键对其进行自定义