解析多级JSON Python

时间:2019-04-05 12:24:08

标签: python json python-3.x parsing

JSON响应-

{
  "001": {
    "STUDENTTYPE": {
      "TYPE": "Boarder"
    },
    "ACADEMICS": [
      {
        "SCI": 42,
        "MTH": 22
      },
      {
        "SCI": 49,
        "MTH": 36
      },
      {
        "SCI": 42,
        "MTH": 26
      }
    ],
    "ROLL": "001",
    "NAME": "Ben",
    "CLASS": "XI",
    "CLASSTEACHER": "Aka",
    "HOME": "Katrasgarh"
  },
  "002": {
    "STUDENTTYPE": {
      "TYPE": "DayScholar"
    },
    "ACADEMICS": [
      {
        "SCI": 43,
        "MTH": 24
      },
      {
        "SCI": 43,
        "MTH": 36
      },
      {
        "SCI": 47,
        "MTH": 28
      }
    ],
    "ROLL": "002",
    "NAME": "Bee",
    "CLASS": "XI",
    "CLASSTEACHER": "Ama",
    "HOME": "Kats"
  }
  ....
}

我无法获取内部JSON。这是我到目前为止所做的-

jsonLocation = sys.argv[1]
jsonFile = open(jsonLocation, 'rb')
jsonData = json.load(jsonFile)

for rollNo in jsonData:
print(rollNo)
for studentItems in jsonData[rollNo]:
     print(studentItems['ROLL'])
     print(studentItems['NAME'])
     print(studentItems['CLASS'])
     print(studentItems['CLASSTEATCHER'])
     print(studentItems['HOME'])
     print(studentItems['STUDENTTYPETYPE']['TYPE'])

我确实获得了studentItems中每个键的值,但是在我看来这是一种笨拙的方式。我也尝试过json.dump,但是它失败,并显示JSON不可序列化的错误。 有没有更好的方法来遍历此JSON格式?

这是我正在寻找的示例输出-

001:

001
Ben
XI
Aka
Katrasgarh

Boarder

42,22
49,36
42,26

002:

002
Bee
XI
Ama
Kats
..
.

1 个答案:

答案 0 :(得分:0)

对于输出的外观尚不清楚,但我继续进行了铺展,将嵌套的json展平,然后将其重构为数据框。从那里,您可以通过切片/过滤表,写入csv或执行任何您想做的事情来访问数据。但是基本上每一行都将代表ROLL,具有属性,以及相应的科学和数学成绩,其索引号从0开始。如果某些学生在ACADEMICS键中有较长的列表,则对于考试成绩最低的学生,将在所有行中带有空值。

给出:

jsonData = {
  "001": {
    "STUDENTTYPE": {
      "TYPE": "Boarder"
    },
    "ACADEMICS": [
      {
        "SCI": 42,
        "MTH": 22
      },
      {
        "SCI": 49,
        "MTH": 36
      },
      {
        "SCI": 42,
        "MTH": 26
      }
    ],
    "ROLL": "001",
    "NAME": "Ben",
    "CLASS": "XI",
    "CLASSTEACHER": "Aka",
    "HOME": "Katrasgarh"
  },
  "002": {
    "STUDENTTYPE": {
      "TYPE": "DayScholar"
    },
    "ACADEMICS": [
      {
        "SCI": 43,
        "MTH": 24
      },
      {
        "SCI": 43,
        "MTH": 36
      },
      {
        "SCI": 47,
        "MTH": 28
      }
    ],
    "ROLL": "002",
    "NAME": "Bee",
    "CLASS": "XI",
    "CLASSTEACHER": "Ama",
    "HOME": "Kats"
  }

}

代码:

import json
import pandas as pd
import re

def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out


flat = flatten_json(jsonData)



results = pd.DataFrame()
columns_list = list(flat.keys())
for item in columns_list:
    row_idx = re.findall(r'(\d+)\_', item )[0]
    column = item.replace(row_idx + '_', '')
    row_idx = int(row_idx)
    value = flat[item]

    results.loc[row_idx, column] = value

输出:

print (results.to_string())
  STUDENTTYPE_TYPE  ACADEMICS_0_SCI  ACADEMICS_0_MTH  ACADEMICS_1_SCI  ACADEMICS_1_MTH  ACADEMICS_2_SCI  ACADEMICS_2_MTH ROLL NAME CLASS CLASSTEACHER        HOME
1          Boarder             42.0             22.0             49.0             36.0             42.0             26.0  001  Ben    XI          Aka  Katrasgarh
2       DayScholar             43.0             24.0             43.0             36.0             47.0             28.0  002  Bee    XI          Ama        Kats