如何将JSON展平为pd.dataframe,如下所示:
class_id|id |schedule_id |schedule_date |lesson_price |status`
1 | 3 | 1 | 2017-07-11 | USD 25 | ONGOING
1 | 3 | 2 | 2016-09-24 | USD 15 | OPEN REGISTRATION
1 | 4 | 1 | 2016-12-17 | USD 19 | ONGOING
1 | 4 | 2 | 2015-11-12 | USD 29 | ONGOING
1 | 4 | 3 | 2015-11-10 | USD 14 | ON SCHEDULE
2 | 1 | 1 | 2017-05-21 | USD 50 | CANCELLED
2 | 2 | 1 | 2017-06-04 | USD10 | FINISHED
2 | 2 | 2 | 2018-03-01 | USD12 | CLOSED
来自JSON
我已经尝试过使用此reference,但是我给了我2行groupby class_id
如何显示课程对象中所有带有class_id和id的数据时间表,如所需的数据框?
答案 0 :(得分:0)
数据结构中的困难来自
{
"lesson3": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
}
拥有
{
"name": "lesson3",
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
但是我们无法控制大部分时间获得的数据。因此,我们必须摆脱第1课,第2课键,然后将对象向上移动。
import requests
data = requests.get(url).json()
提取不同的课程
data_ = [{'class_id': c['class_id'], 'lessons': v} for c in data['class'] for d, v in c['data'].items()]
数据现在看起来像这样
[
{
"class_id": "1",
"lessons": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
},
...
]
现在我们可以使用json_normalize
df = json_normalize(data_, record_path=['lessons', 'schedule'], meta=['class_id', ['lessons', 'id']])
schedule_id schedule_date lesson_price status class_id lessons.id
0 1 2017-07-11 USD 25 ONGOING 1 3
1 2 2016-09-24 USD 15 OPEN REGISTRATION 1 3
2 1 2016-12-17 USD 19 ONGOING 1 4
3 2 2015-11-12 USD 29 ONGOING 1 4
4 3 2015-11-10 USD 14 ON SCHEDULE 1 4
5 1 2017-05-21 USD 50 CANCELLED 2 1
6 1 2017-06-04 USD10 FINISHED 2 2
7 5 2018-03-01 USD12 CLOSED 2 2