我有一个嵌套的json,并想使用json_normalize函数将其转换为熊猫数据框。
JSON
json_input = [{'measurements': [{'value': 111, 'timestamp': 1},
{'value': 222, 'timestamp': 2}],
'sensor': {'name': 'testsensor',
'id': 1}},
{'measurements': [{'value': 333, 'timestamp': 1},
{'value': 444, 'timestamp': 2}],
'sensor': None},
]
规范化
df = pd.json_normalize(json_input, record_path=['measurements'],
meta=['sensor'])
上述代码的输出中的元数据未标准化:
| | value | timestamp | sensor |
|---|-------|-----------|---------------------------------|
| 0 | 111 | 1 | {'name': 'testsensor', 'id': 1} |
| 1 | 222 | 2 | {'name': 'testsensor', 'id': 1} |
| 2 | 111 | 1 | None |
| 3 | 222 | 2 | None |
是否有可能获得所需的输出:
| | value | timestamp | sensor.name | sensor.id |
|---|-------|-----------|--------------|-----------|
| 0 | 111 | 1 | 'testsensor' | 1 |
| 1 | 222 | 2 | 'testsensor' | 1 |
| 2 | 111 | 1 | None | None |
| 3 | 222 | 2 | None | None |
答案 0 :(得分:1)
由构造函数创建DataFrame
,并用替换空列表替换成空字典,并通过concat
联接在一起:
df = pd.json_normalize(json_input, record_path=['measurements'],
meta=['sensor'])
#pandas 1.0.1
df1 = pd.DataFrame([{} if x == [] else x for x in df.pop('sensor')]).add_prefix("sensor.")
#pandas 1.0.3
df1 = pd.DataFrame([{} if x == None else x for x in df.pop('sensor')]).add_prefix("sensor.")
df = pd.concat([df, df1], axis=1)
print (df)
value timestamp sensor.name sensor.id
0 111 1 testsensor 1.0
1 222 2 testsensor 1.0
2 333 1 NaN NaN
3 444 2 NaN NaN
答案 1 :(得分:0)
这可以-> df['sensor'].apply(pd.Series).add_prefix("sensor.")]
df = pd.json_normalize(json_input, record_path=['measurements'],
meta=['sensor'])
df = pd.concat([df, df['sensor'].apply(pd.Series).add_prefix("sensor.")], axis=1)
df.drop('sensor', inplace=True, axis=1)
df
value timestamp sensor.name sensor.id
0 111 1 testsensor 1.0
1 222 2 testsensor 1.0
2 333 1 NaN NaN
3 444 2 NaN NaN
jezrael提到。 .apply(pd.series)
的速度很慢,您可以使用此功能:
pd.DataFrame([i if i!=None else {} for i in df['sensor'].tolist()]
df = pd.json_normalize(json_input, record_path=['measurements'],
meta=['sensor'])
df = pd.concat([df, pd.DataFrame([i if i!=None else {} for i in df['sensor'].tolist()]
).add_prefix("sensor")], axis=1)
df.drop('sensor', inplace=True, axis=1)
df