JSON响应-
{
"001": {
"STUDENTTYPE": {
"TYPE": "Boarder"
},
"ACADEMICS": [
{
"SCI": 42,
"MTH": 22
},
{
"SCI": 49,
"MTH": 36
},
{
"SCI": 42,
"MTH": 26
}
],
"ROLL": "001",
"NAME": "Ben",
"CLASS": "XI",
"CLASSTEACHER": "Aka",
"HOME": "Katrasgarh"
},
"002": {
"STUDENTTYPE": {
"TYPE": "DayScholar"
},
"ACADEMICS": [
{
"SCI": 43,
"MTH": 24
},
{
"SCI": 43,
"MTH": 36
},
{
"SCI": 47,
"MTH": 28
}
],
"ROLL": "002",
"NAME": "Bee",
"CLASS": "XI",
"CLASSTEACHER": "Ama",
"HOME": "Kats"
}
....
}
我无法获取内部JSON。这是我到目前为止所做的-
jsonLocation = sys.argv[1]
jsonFile = open(jsonLocation, 'rb')
jsonData = json.load(jsonFile)
for rollNo in jsonData:
print(rollNo)
for studentItems in jsonData[rollNo]:
print(studentItems['ROLL'])
print(studentItems['NAME'])
print(studentItems['CLASS'])
print(studentItems['CLASSTEATCHER'])
print(studentItems['HOME'])
print(studentItems['STUDENTTYPETYPE']['TYPE'])
我确实获得了studentItems
中每个键的值,但是在我看来这是一种笨拙的方式。我也尝试过json.dump
,但是它失败,并显示JSON不可序列化的错误。
有没有更好的方法来遍历此JSON格式?
这是我正在寻找的示例输出-
001:
001
Ben
XI
Aka
Katrasgarh
Boarder
42,22
49,36
42,26
002:
002
Bee
XI
Ama
Kats
..
.
答案 0 :(得分:0)
对于输出的外观尚不清楚,但我继续进行了铺展,将嵌套的json展平,然后将其重构为数据框。从那里,您可以通过切片/过滤表,写入csv或执行任何您想做的事情来访问数据。但是基本上每一行都将代表ROLL
,具有属性,以及相应的科学和数学成绩,其索引号从0开始。如果某些学生在ACADEMICS
键中有较长的列表,则对于考试成绩最低的学生,将在所有行中带有空值。
给出:
jsonData = {
"001": {
"STUDENTTYPE": {
"TYPE": "Boarder"
},
"ACADEMICS": [
{
"SCI": 42,
"MTH": 22
},
{
"SCI": 49,
"MTH": 36
},
{
"SCI": 42,
"MTH": 26
}
],
"ROLL": "001",
"NAME": "Ben",
"CLASS": "XI",
"CLASSTEACHER": "Aka",
"HOME": "Katrasgarh"
},
"002": {
"STUDENTTYPE": {
"TYPE": "DayScholar"
},
"ACADEMICS": [
{
"SCI": 43,
"MTH": 24
},
{
"SCI": 43,
"MTH": 36
},
{
"SCI": 47,
"MTH": 28
}
],
"ROLL": "002",
"NAME": "Bee",
"CLASS": "XI",
"CLASSTEACHER": "Ama",
"HOME": "Kats"
}
}
代码:
import json
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(jsonData)
results = pd.DataFrame()
columns_list = list(flat.keys())
for item in columns_list:
row_idx = re.findall(r'(\d+)\_', item )[0]
column = item.replace(row_idx + '_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
输出:
print (results.to_string())
STUDENTTYPE_TYPE ACADEMICS_0_SCI ACADEMICS_0_MTH ACADEMICS_1_SCI ACADEMICS_1_MTH ACADEMICS_2_SCI ACADEMICS_2_MTH ROLL NAME CLASS CLASSTEACHER HOME
1 Boarder 42.0 22.0 49.0 36.0 42.0 26.0 001 Ben XI Aka Katrasgarh
2 DayScholar 43.0 24.0 43.0 36.0 47.0 28.0 002 Bee XI Ama Kats