输入数据:
results= [
{
"timestamp_datetime": "2014-03-31 18:10:00 UTC",
"job_id": 5,
"processor_utilization_percentage": 72
},
{
"timestamp_datetime": "2014-03-31 18:20:00 UTC",
"job_id": 2,
"processor_utilization_percentage": 60
},
{
"timestamp_datetime": "2014-03-30 18:20:00 UTC",
"job_id": 2,
"processor_utilization_percentage": 0
}]
输出必须按如下方式排序,按job_id
按升序分组:
newresult = {
'2':[{ "timestamp_datetime": "2014-03-31 18:20:00 UTC",
"processor_utilization_percentage": 60},
{"timestamp_datetime": "2014-03-30 18:20:00 UTC",
"processor_utilization_percentage": 0},]
'5':[{
"timestamp_datetime": "2014-03-31 18:10:00 UTC",
"processor_utilization_percentage": 72},
],
}
什么是pythonic方法呢?
答案 0 :(得分:5)
您分组;使用collections.defaultdict()
object:
from collections import defaultdict
newresult = defaultdict(list)
for entry in result:
job_id = entry.pop('job_id')
newresult[job_id].append(entry)
newresult
是一本字典而且没有订购;如果您需要按升序访问作业ID,请在列出时对键进行排序:
for job_id in sorted(newresult):
# loops over the job ids in ascending order.
for job in newresult[job_id]:
# entries per job id
答案 1 :(得分:3)
您可以使用itertools.groupby
按results
对job_id
进行分组:
from itertools import groupby
new_results = {k: list(g) for k, g in groupby(results, key=lambda d: d["job_id"])}
结果是字典,即它没有特定的顺序。如果要按升序迭代值,可以执行以下操作:
for key in sorted(new_results):
entries = new_results[key]
# do something with entries
更新:正如Martijn指出的那样,这需要results
列表按job_id
排序(如您的示例所示),否则条目可能会丢失。
答案 2 :(得分:0)
假设你真的不想在newresult中使用job_id:
from collections import defaultdict
newresult = defaultdict(list)
for result in results:
job_id = result['job_id']
newresult[job_id].append(
{'timestamp_datetime':result['timestamp_datetime'],
'processor_utilization_percentage':result['processor_utilization_percentage']}
)
#print newresult
我没有真正看到通过字典理解来实现这一目标的方法,但我确定那里的某些人有更多的经验来做那种可以把它拉下来的东西。不过,这非常简单。