我在尝试将json字符串整齐地整理到熊猫数据框中时遇到问题。当我使用json_normalize时,我将第一个父代“ id”作为一列,而其余的字符串在第二列之内。第二列是每个元素带有多级字符串的列表。
我不确定如何在无需创建可循环遍历并将每个级别绑定到数据帧的解决方案的情况下干净地展平此字符串。
这是API URL:https://api.collegefootballdata.com/games/players?year=2018&week=1&seasonType=regular
import requests
import pandas as pd
from pandas.io.json import json_normalize
import json
base = 'https://api.collegefootballdata.com/'
end_point = 'games/players?year='
second_end_point = '&week='
third_end_point = '&seasonType=regular'
request = requests.get(base + end_point + str(2018) + second_end_point + str(1) + third_end_point).text
json_dict = json.loads(request)
normalize_df = json_normalize(json_dict)
print(normalize_df)
答案 0 :(得分:1)
使用它作为启动器并对其进行修改以适合您的需求:
pd.io.json.json_normalize(json_dict, ['teams', 'categories', 'types', 'athletes'], meta=[
['teams', 'school'],
['teams', 'categories', 'name'],
['teams', 'categories', 'types', 'name']
])
结果:
id name stat teams.school teams.categories.name teams.categories.types.name
0 3115980 Lawrence Marshall 0 Michigan defensive PD
1 4360699 Myles Sims 0 Michigan defensive PD
2 4046537 Josh Metellus 0 Michigan defensive PD
3 4046525 Khaleke Hudson 0 Michigan defensive PD
4 3115968 Brandon Watson 0 Michigan defensive PD
5 4258211 J'Marick Woods 0 Michigan defensive PD
6 4046526 Devin Gil 0 Michigan defensive PD
7 4046536 David Long 0 Michigan defensive PD
8 4046523 Rashan Gary 0 Michigan defensive PD
9 4258198 Josh Ross 0 Michigan defensive PD