我有一个dicts列表如下:
[
{
"status": "BV",
"max_total_duration": null,
"min_total_duration": null,
"75th_percentile": 420,
"median": 240.0,
"25th_percentile": 180,
"avg_total_duration": null
},
{
"status": "CORR",
"max_total_duration": null,
"min_total_duration": null,
"75th_percentile": 1380,
"median": 720.0,
"25th_percentile": 420,
"avg_total_duration": null
},
{
"status": "FILL",
"max_total_duration": null,
"min_total_duration": null,
"75th_percentile": 1500,
"median": 840.0,
"25th_percentile": 480,
"avg_total_duration": null
},
{
"status": "INIT",
"max_total_duration": 11280,
"min_total_duration": 120,
"75th_percentile": 720,
"median": 360.0,
"25th_percentile": 180,
"avg_total_duration": 2061
},
]
很明显,对于所有状态,max_total_duration,min_total_duration和avg_total_duration为空,除非status为“INIT”。我想要的是删除所有空值的条目和INIT,其中max_total_duration,min_total_duration和avg_total_duration具有正确的值,将它们添加为列表中的新字典,如下所示:
[
{
"status": "BV",
"75th_percentile": 420,
"median": 240.0,
"25th_percentile": 180,
},
{
"status": "CORR",
"75th_percentile": 1380,
"median": 720.0,
"25th_percentile": 420,
},
{
"status": "FILL",
"75th_percentile": 1500,
"median": 840.0,
"25th_percentile": 480,
},
{
"status": "INIT",
"75th_percentile": 720,
"median": 360.0,
"25th_percentile": 180,
},
{
"max_total_duration": 11280,
"min_total_duration": 120,
"avg_total_duration": 2061,
}
]
我已经尝试通过遍历列表进行此操作并且计算成本非常高。使用pandas有更简单的方法吗?
答案 0 :(得分:2)
data =[
{
"status": "BV",
"max_total_duration": None,
"min_total_duration": None,
"75th_percentile": 420,
"median": 240.0,
"25th_percentile": 180,
"avg_total_duration": None
},
{
"status": "CORR",
"max_total_duration": None,
"min_total_duration": None,
"75th_percentile": 1380,
"median": 720.0,
"25th_percentile": 420,
"avg_total_duration": None
},
{
"status": "FILL",
"max_total_duration": None,
"min_total_duration": None,
"75th_percentile": 1500,
"median": 840.0,
"25th_percentile": 480,
"avg_total_duration": None
},
{
"status": "INIT",
"max_total_duration": 11280,
"min_total_duration": 120,
"75th_percentile": 720,
"median": 360.0,
"25th_percentile": 180,
"avg_total_duration": 2061
},
]
data = [{key: val for key, val in d.iteritems() if val} for d in data]
final = []
for d in data:
status = d.get('status')
if status == 'INIT':
final.append({'max_total_duration': d.get('max_total_duration'), 'min_total_duration': d.get('min_total_duration'), 'avg_total_duration': d.get('avg_total_duration')})
del d['max_total_duration']
del d['min_total_duration']
del d['avg_total_duration']
final.append(d)
print final
答案 1 :(得分:1)
import pandas as pd
# Substituting your 'null' for 'None'
df = pd.DataFrame(data)
>>> df
25th_percentile 75th_percentile avg_total_duration max_total_duration \
0 180 420 NaN NaN
1 420 1380 NaN NaN
2 480 1500 NaN NaN
3 180 720 2061 11280
median min_total_duration status
0 240 NaN BV
1 720 NaN CORR
2 840 NaN FILL
3 360 120 INIT
抓住百分位数部分:
df_percentiles = df[['status','25th_percentile','median','75th_percentile']]
>>> df_percentiles
status 25th_percentile median 75th_percentile
0 BV 180 240 420
1 CORR 420 720 1380
2 FILL 480 840 1500
3 INIT 180 360 720
抓住持续时间部分:
df_durations = df[df['status'] == 'INIT'][['max_total_duration','min_total_duration','avg_total_duration']]
>>> df_durations
max_total_duration min_total_duration avg_total_duration
3 11280 120 2061
循环并合并到列表:
summary = df_percentiles.T.to_dict().values()
summary.append(df_durations.T.to_dict().values())
>>> summary
[{'25th_percentile': 180,
'75th_percentile': 420,
'median': 240.0,
'status': 'BV'},
{'25th_percentile': 420,
'75th_percentile': 1380,
'median': 720.0,
'status': 'CORR'},
{'25th_percentile': 480,
'75th_percentile': 1500,
'median': 840.0,
'status': 'FILL'},
{'25th_percentile': 180,
'75th_percentile': 720,
'median': 360.0,
'status': 'INIT'},
{'avg_total_duration': 2061.0,
'max_total_duration': 11280.0,
'min_total_duration': 120.0}]