从嵌套的json python生成csv

时间:2018-04-21 07:33:22

标签: python json python-3.x pandas dataframe

我有以下嵌套的json文件,我需要在pandas dataframe中进行转换,主要问题是整个json中只有一个唯一的项目,它是非常深层嵌套的。

我尝试使用以下代码解决此问题,但它会重复输出。

[{
"questions": [{
        "key": "years-age",
        "responseKey": null,
        "responseText": "27",
        "responseKeys": null
    },
    {
        "key": "gender",
        "responseKey": "male",
        "responseText": null,
        "responseKeys": null
    }

],
"transactions": [{
        "accId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
        "tId": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
        "catId": "21001000",
        "tType": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
        "name": "Online Transfer FROM CHECKING 1200454623",
        "category": [
            "Transfer",
            "Acc Transfer"
        ]
    }

],
"institutions": [{
    "InstName": "Citizens company",
    "InstId": "inst_1",
    "accounts": [{
        "pAccId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
        "pAccType": "depo",
        "pAccSubtype": "check",
        "_id": "5ad38837e806efaa90da4849"
    }]

}]
}]

我需要将其转换为pandas数据帧,如下所示:

 id                        pAccId                                  tId      

 5ad38837e806efaa90da4849  v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ   80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53   

我面临的主要问题是“id”,因为它是非常深层嵌套的,这是json唯一的唯一键。

这是我的代码:

  import pandas as pd
  import json
  with open('sub.json') as f:
       data = json.load(f)

  csv = ''
  for k in data:
       for t in k.get("institutions"):
           csv += k['institutions'][0]['accounts'][0]['_id']
           csv += "\t"
           csv += k['institutions'][0]['accounts'][0]['pAccId']
           csv += "\t"
           csv += k['transactions'][]['tId']
           csv += "\t"
           csv += "\n"

text_file = open("new_sub.csv", "w")
text_file.write(csv)
text_file.close()

希望上面的代码是有道理的,因为我是python的新手。

1 个答案:

答案 0 :(得分:1)

读取JSON文件并创建映射到帐户的帐户pAccId键字典。 建立交易清单。

with open('sub.json', 'r') as file:
    records = json.load(file)
    accounts = {
       account['pAccId']: account 
       for record in records 
       for institution in record['institutions']
       for account in institution['accounts']
    }
    transactions = (
        transaction 
        for record in records 
        for transaction in record['transactions']
    )

打开csv文件。对于每笔交易,请从accounts字典中获取帐户。

with open('new_sub.csv', 'w') as file:
    file.write('id, pAccId, tId\n')

    for transaction in transactions:
        pAccId = transaction['accId']
        account = accounts[pAccId]
        _id = account['_id']
        tId = transaction['tId']

        file.write(f"{_id}, {pAccId}, {tId}\n")

最后,将csv文件读取到pandas.DataFrame

df = pd.read_csv('new_sub.csv')