Python词典列表:获取每一天的最后一项

时间:2016-08-29 14:30:20

标签: python

我有一个字典列表,按键date排序:

d = [{'date': datetime.strptime('2016-01-01 07:00', "%Y-%m-%d %H:%M"), 'val': 1},
{'date': datetime.strptime('2016-01-01 23:00', "%Y-%m-%d %H:%M"), 'val': 3},
{'date': datetime.strptime('2016-01-02 07:00', "%Y-%m-%d %H:%M"), 'val': 5},
{'date': datetime.strptime('2016-01-02 22:13', "%Y-%m-%d %H:%M"), 'val': 7},
{'date': datetime.strptime('2016-01-02 23:00', "%Y-%m-%d %H:%M"), 'val': 9},
{'date': datetime.strptime('2016-01-03 00:10', "%Y-%m-%d %H:%M"), 'val': 17},
{'date': datetime.strptime('2016-01-03 09:12', "%Y-%m-%d %H:%M"), 'val': 25},
{'date': datetime.strptime('2016-01-03 21:52', "%Y-%m-%d %H:%M"), 'val': 37}]

我想得到每天的最后一项(最新),所以在这种情况下它会是:

{'date': datetime.strptime('2016-01-01 23:00', "%Y-%m-%d %H:%M"), 'val': 3},
{'date': datetime.strptime('2016-01-02 23:00', "%Y-%m-%d %H:%M"), 'val': 9},
{'date': datetime.strptime('2016-01-03 21:52', "%Y-%m-%d %H:%M"), 'val': 37},

我有以下代码可以解决这个问题:

previous_item = None
wanted_data = []
for index, entry in enumerate(d):
    if not previous_item:
        previous_item = entry
        continue
    if entry['date'].date() != previous_item['date'].date():
        wanted_data.append(previous_item)
    previous_item = entry

    #Add as well the last item
    if index + 1 == len(d):
        wanted_data.append(entry)

但我相信有更好更快的方法来做到这一点......此外,那非常难看。

有更多的蟒蛇方式来实现这个目标吗?

谢谢!

1 个答案:

答案 0 :(得分:3)

假设数据已经按'date'排序(似乎是您的情况),您可以使用itertools.groupbydate()进行分组,然后获取最后一项来自每个小组。

>>> d = sorted(d, key=lambda x: x["date"])  # only if not already sorted
>>> groups = itertools.groupby(d, lambda x: x["date"].date())
>>> wanted_data = [list(grp)[-1] for key, grp in groups]
>>> wanted_data
[{'date': datetime.datetime(2016, 1, 1, 23, 0), 'val': 3},
 {'date': datetime.datetime(2016, 1, 2, 23, 0), 'val': 9},
 {'date': datetime.datetime(2016, 1, 3, 21, 52), 'val': 37}]

请注意,这会将每个组扩展为​​list。如果这太贵了,因为每个日期的条目非常多,你可以创建一个函数来从迭代器中获取最后一个条目,例如:使用reduce(或Python 3中的functools.reduce):

>>> last = lambda x: functools.reduce(lambda x, y: y, x)
>>> wanted_data = [last(grp) for key, grp in groups]