我有一个嵌套列表,其结构如下:
longlist = [["Bob", "2019-12-12 19:03"], ["Joe", "2019-12-12 19:04"],
["Sally", "2019-12-12 21:07"], ["Jane", "2019-12-13 2:02"],
["Jose", "2019-12-13 3:04"], ["Ahmed", "2019-12-13 3:06"],
["Xi", "2019-12-13 7:43"]]
我的目标是在{strong>每个日期中仅保留longlist
中的第一项和最后一项。使用上面的嵌套列表,我想要的输出是:
newlist = [["Bob", "2019-12-12 19:03"], ["Sally", "2019-12-12 21:07"],
["Jane", "2019-12-13 2:02"], ["Xi", "2019-12-13 7:43"]]
我搜索了整个SO和在线内容,但是找不到我想要的东西。有人知道这怎么可能吗?
答案 0 :(得分:2)
您可以使用itertools.groupby
(doc)并按日期字符串对元素进行分组(分割后)。
例如:
from itertools import groupby
longlist = [["Bob", "2019-12-12 19:03"], ["Joe", "2019-12-12 19:04"],
["Sally", "2019-12-12 21:07"], ["Jane", "2019-12-13 2:02"],
["Jose", "2019-12-13 3:04"], ["Ahmed", "2019-12-13 3:06"],
["Xi", "2019-12-13 7:43"]]
out = []
for _, g in groupby(longlist, lambda k: k[1].split()[0]):
first, *_, last = g
out.extend([first, last])
from pprint import pprint
pprint(out)
打印:
[['Bob', '2019-12-12 19:03'],
['Sally', '2019-12-12 21:07'],
['Jane', '2019-12-13 2:02'],
['Xi', '2019-12-13 7:43']]
注意:根据您的评论,我没有检查唯一日期(如您所说,所有日期至少包含10个项目。)
答案 1 :(得分:0)
这是一个纯粹的熊猫解决方案:
df5 = pd.DataFrame(longlist)
df5['datetime'] = pd.to_datetime(df5[1])
list(map(list,list(df5.groupby(df5['datetime'].dt.day).agg('first').append(df5.groupby(df5['datetime'].dt.day).agg('last')).drop(columns='datetime').set_index(0).to_dict()[1].items())))
[['Bob', '2019-12-12 19:03'],
['Jane', '2019-12-13 2:02'],
['Sally', '2019-12-12 21:07'],
['Xi', '2019-12-13 7:43']]