csv嵌套JSON?

时间:2016-03-13 11:23:34

标签: python json csv

我正在尝试将平面结构化CSV转换为嵌套的JSON结构。 CSV是从SQL生成的,它为每个主ID创建多行。 CSV的结构如下:

PrimaryId,FirstName,LastName,City,CarName,DogName
100,John,Smith,NewYork,Toyota,Spike
100,John,Smith,NewYork,BMW,Spike
100,John,Smith,NewYork,Toyota,Rusty
100,John,Smith,NewYork,BMW,Rusty
101,Ben,Swan,Sydney,Volkswagen,Buddy
101,Ben,Swan,Sydney,Ford,Buddy
101,Ben,Swan,Sydney,Audi,Buddy
101,Ben,Swan,Sydney,Volkswagen,Max
101,Ben,Swan,Sydney,Ford,Max
101,Ben,Swan,Sydney,Audi,Max
102,Julia,Brown,London,Mini,Lucy

所需的JSON输出为:

{
    "data": [
        {
            "City": "NewYork", 
            "FirstName": "John", 
            "PrimaryId": 100, 
            "LastName": "Smith", 
            "CarName": [
                "Toyota", 
                "BMW"
            ], 
            "DogName": [
                "Spike", 
                "Rusty"
            ]
        }, 
        {
            "City": "Sydney", 
            "FirstName": "Ben", 
            "PrimaryId": 101, 
            "LastName": "Swan", 
            "CarName": [
                "Volkswagen", 
                "Ford", 
                "Audi"
            ], 
            "DogName": [
                "Buddy", 
                "Max"
            ]
        }, 
        {
            "City": "London", 
            "FirstName": "Julia", 
            "PrimaryId": 102, 
            "LastName": "Brown", 
            "CarName": [
                "Mini"
            ], 
            "DogName": [
                "Lucy"
            ]
        }
    ]
}

this postthis one 都有帮助,但我还没有创建正确的结构。

2 个答案:

答案 0 :(得分:5)

以下是使用csv.DictReader进行此操作的一般方法。

首先加载数据:

import csv
import itertools
with open('stuff.csv', 'rb') as csvfile:
    all_ = list(csv.DictReader(csvfile))

现在,您可以使用itertools.groupby对每个组进行分组和处理。例如

d = []
for k, g in itertools.groupby(
        all_, 
        key=lambda r: (r['PrimaryId'], r[' LastName'])):
    d.append({
        'PrimaryId': k[0],
        'LastName': k[1],
        'CarName': [e[' CarName'] for e in g]
        })

将按主要ID和姓氏分组,并列出汽车列表。

如果你有这样的话,你可以使用json.dumps()

答案 1 :(得分:1)

转换为有效csv的数据会保存在data.csv

PrimaryId,FirstName,LastName,City,CarName,DogName
100,John,Smith,NewYork,Toyota,Spike
100,John,Smith,NewYork,BMW,Spike
100,John,Smith,NewYork,Toyota,Rusty
100,John,Smith,NewYork,BMW,Rusty
101,Ben,Swan,Sydney,Volkswagen,Buddy
101,Ben,Swan,Sydney,Ford,Buddy
101,Ben,Swan,Sydney,Audi,Buddy
101,Ben,Swan,Sydney,Volkswagen,Max
101,Ben,Swan,Sydney,Ford,Max
101,Ben,Swan,Sydney,Audi,Max
102,Julia,Brown,London,Mini,Lucy

使用pandas进行繁重的工作,并假设这个有效的csv文件,这是做你想做的事情的一种方式:

import json
import pandas as pd

df = pd.read_csv('data.csv')

def get_nested_rec(key, grp):
    rec = {}
    rec['PrimaryId'] = key[0]
    rec['FirstName'] = key[1]
    rec['LastName'] = key[2]
    rec['City'] = key[3]

    for field in ['CarName','DogName']:
        rec[field] = list(grp[field].unique())

    return rec

records = []
for key, grp in df.groupby(['PrimaryId','FirstName','LastName','City']):
    rec = get_nested_rec(key, grp)
    records.append(rec)

records = dict(data = records)

print(json.dumps(records, indent=4))

结果:

{
    "data": [
        {
            "City": "NewYork", 
            "FirstName": "John", 
            "PrimaryId": 100, 
            "LastName": "Smith", 
            "CarName": [
                "Toyota", 
                "BMW"
            ], 
            "DogName": [
                "Spike", 
                "Rusty"
            ]
        }, 
        {
            "City": "Sydney", 
            "FirstName": "Ben", 
            "PrimaryId": 101, 
            "LastName": "Swan", 
            "CarName": [
                "Volkswagen", 
                "Ford", 
                "Audi"
            ], 
            "DogName": [
                "Buddy", 
                "Max"
            ]
        }, 
        {
            "City": "London", 
            "FirstName": "Julia", 
            "PrimaryId": 102, 
            "LastName": "Brown", 
            "CarName": [
                "Mini"
            ], 
            "DogName": [
                "Lucy"
            ]
        }
    ]
}