我有以下嵌套的json文件,我需要在pandas dataframe中进行转换,主要问题是整个json中只有一个唯一的项目,它是非常深层嵌套的。
我尝试使用以下代码解决此问题,但它会重复输出。
[{
"questions": [{
"key": "years-age",
"responseKey": null,
"responseText": "27",
"responseKeys": null
},
{
"key": "gender",
"responseKey": "male",
"responseText": null,
"responseKeys": null
}
],
"transactions": [{
"accId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"tId": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"catId": "21001000",
"tType": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"name": "Online Transfer FROM CHECKING 1200454623",
"category": [
"Transfer",
"Acc Transfer"
]
}
],
"institutions": [{
"InstName": "Citizens company",
"InstId": "inst_1",
"accounts": [{
"pAccId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"pAccType": "depo",
"pAccSubtype": "check",
"_id": "5ad38837e806efaa90da4849"
}]
}]
}]
我需要将其转换为pandas数据帧,如下所示:
id pAccId tId
5ad38837e806efaa90da4849 v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ 80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53
我面临的主要问题是“id”,因为它是非常深层嵌套的,这是json唯一的唯一键。
这是我的代码:
import pandas as pd
import json
with open('sub.json') as f:
data = json.load(f)
csv = ''
for k in data:
for t in k.get("institutions"):
csv += k['institutions'][0]['accounts'][0]['_id']
csv += "\t"
csv += k['institutions'][0]['accounts'][0]['pAccId']
csv += "\t"
csv += k['transactions'][]['tId']
csv += "\t"
csv += "\n"
text_file = open("new_sub.csv", "w")
text_file.write(csv)
text_file.close()
希望上面的代码是有道理的,因为我是python的新手。
答案 0 :(得分:1)
读取JSON文件并创建映射到帐户的帐户pAccId
键字典。
建立交易清单。
with open('sub.json', 'r') as file:
records = json.load(file)
accounts = {
account['pAccId']: account
for record in records
for institution in record['institutions']
for account in institution['accounts']
}
transactions = (
transaction
for record in records
for transaction in record['transactions']
)
打开csv文件。对于每笔交易,请从accounts
字典中获取帐户。
with open('new_sub.csv', 'w') as file:
file.write('id, pAccId, tId\n')
for transaction in transactions:
pAccId = transaction['accId']
account = accounts[pAccId]
_id = account['_id']
tId = transaction['tId']
file.write(f"{_id}, {pAccId}, {tId}\n")
最后,将csv文件读取到pandas.DataFrame
。
df = pd.read_csv('new_sub.csv')