Question

我在尝试将json字符串整齐地整理到熊猫数据框中时遇到问题。当我使用json_normalize时，我将第一个父代“ id”作为一列，而其余的字符串在第二列之内。第二列是每个元素带有多级字符串的列表。

我不确定如何在无需创建可循环遍历并将每个级别绑定到数据帧的解决方案的情况下干净地展平此字符串。

这是API URL：https://api.collegefootballdata.com/games/players?year=2018&week=1&seasonType=regular

import requests
import pandas as pd
from pandas.io.json import json_normalize
import json

base = 'https://api.collegefootballdata.com/'
end_point = 'games/players?year='
second_end_point = '&week='
third_end_point = '&seasonType=regular'

request = requests.get(base + end_point + str(2018) + second_end_point + str(1) + third_end_point).text
json_dict = json.loads(request)
normalize_df = json_normalize(json_dict)
print(normalize_df)

Answer 1

使用它作为启动器并对其进行修改以适合您的需求：

pd.io.json.json_normalize(json_dict, ['teams', 'categories', 'types', 'athletes'], meta=[
    ['teams', 'school'],
    ['teams', 'categories', 'name'],
    ['teams', 'categories', 'types', 'name']
])

结果：

        id               name stat teams.school teams.categories.name teams.categories.types.name
0  3115980  Lawrence Marshall    0     Michigan             defensive                          PD
1  4360699         Myles Sims    0     Michigan             defensive                          PD
2  4046537      Josh Metellus    0     Michigan             defensive                          PD
3  4046525     Khaleke Hudson    0     Michigan             defensive                          PD
4  3115968     Brandon Watson    0     Michigan             defensive                          PD
5  4258211     J'Marick Woods    0     Michigan             defensive                          PD
6  4046526          Devin Gil    0     Michigan             defensive                          PD
7  4046536         David Long    0     Michigan             defensive                          PD
8  4046523        Rashan Gary    0     Michigan             defensive                          PD
9  4258198          Josh Ross    0     Michigan             defensive                          PD

规范列表中包含的多级json字符串

1 个答案: