我在Python中有一个np.datetime64
日期列表:
['2016-12-01T02:00:00.000000000', '2016-12-01T04:00:00.000000000',
'2016-12-01T06:00:00.000000000', '2016-12-01T08:00:00.000000000',
'2016-12-01T10:00:00.000000000', '2016-12-01T12:00:00.000000000',
'2016-12-01T14:00:00.000000000', '2016-12-01T16:00:00.000000000',
'2016-12-01T18:00:00.000000000', '2016-12-01T20:00:00.000000000',
'2016-12-01T22:00:00.000000000', '2016-12-02T00:00:00.000000000',
'2016-12-02T02:00:00.000000000', '2016-12-02T04:00:00.000000000',
'2016-12-02T06:00:00.000000000', '2016-12-02T08:00:00.000000000',
'2016-12-02T10:00:00.000000000', '2016-12-02T12:00:00.000000000',
'2016-12-02T14:00:00.000000000', '2016-12-02T16:00:00.000000000',
'2016-12-02T18:00:00.000000000', '2016-12-02T20:00:00.000000000',
'2016-12-02T22:00:00.000000000', '2016-12-03T00:00:00.000000000',
'2016-12-03T02:00:00.000000000', '2016-12-03T04:00:00.000000000',
'2016-12-03T06:00:00.000000000', '2016-12-03T08:00:00.000000000',
'2016-12-03T10:00:00.000000000', '2016-12-03T12:00:00.000000000',
'2016-12-03T14:00:00.000000000', '2016-12-03T16:00:00.000000000',
'2016-12-03T18:00:00.000000000', '2016-12-03T20:00:00.000000000',
'2016-12-03T22:00:00.000000000']
我希望在列表中的每个日历日循环。我试图从列表中提取每个唯一的日期(即找到最小和最大日期并创建这些日期之间的日期列表)但这对我想做的事情并不理想。
我希望的结果是让代码允许我循环遍历列表中的每个日期/日历日并获取与此日期对应的日期时间:
for each_date in date_list:
***get all datetimes corresponding to each_date***
(loop would occur 3 times in this example)
注:
1)迭代每个[n:n + 24]或任何不会每天都不起作用的解决方案将具有相同的时间步数。
答案 0 :(得分:3)
如果时间戳是有序的,我们可以使用itertools.groupby
函数在相应的日期对数组元素进行分组。
可以使用np.datetime64.astype(..., dtype='datetime64[D]')
获取日期,因此我们可以将其写为:
from numpy import datetime64
from functools import partial
from itertools import groupby
for day, timestamps in groupby(data_array,
partial(datetime64.astype, dtype='datetime64[D]')):
# process day and timestamps
pass
此处day
是datetime64[D]
numpy对象(仅包含当天),timestamps
是可迭代(不是列表,但我们可以将其转换为相应时间戳的列表。 data_array
是包含初始数据的数组。
例如:
>>> for day, timestamps in groupby(data_array,
... partial(datetime64.astype, dtype='datetime64[D]')):
... print((day, list(timestamps)))
...
(numpy.datetime64('2016-12-01'), [numpy.datetime64('2016-12-01T02:00:00.000000000'), numpy.datetime64('2016-12-01T04:00:00.000000000'), numpy.datetime64('2016-12-01T06:00:00.000000000'), numpy.datetime64('2016-12-01T08:00:00.000000000'), numpy.datetime64('2016-12-01T10:00:00.000000000'), numpy.datetime64('2016-12-01T12:00:00.000000000'), numpy.datetime64('2016-12-01T14:00:00.000000000'), numpy.datetime64('2016-12-01T16:00:00.000000000'), numpy.datetime64('2016-12-01T18:00:00.000000000'), numpy.datetime64('2016-12-01T20:00:00.000000000'), numpy.datetime64('2016-12-01T22:00:00.000000000')])
(numpy.datetime64('2016-12-02'), [numpy.datetime64('2016-12-02T00:00:00.000000000'), numpy.datetime64('2016-12-02T02:00:00.000000000'), numpy.datetime64('2016-12-02T04:00:00.000000000'), numpy.datetime64('2016-12-02T06:00:00.000000000'), numpy.datetime64('2016-12-02T08:00:00.000000000'), numpy.datetime64('2016-12-02T10:00:00.000000000'), numpy.datetime64('2016-12-02T12:00:00.000000000'), numpy.datetime64('2016-12-02T14:00:00.000000000'), numpy.datetime64('2016-12-02T16:00:00.000000000'), numpy.datetime64('2016-12-02T18:00:00.000000000'), numpy.datetime64('2016-12-02T20:00:00.000000000'), numpy.datetime64('2016-12-02T22:00:00.000000000')])
(numpy.datetime64('2016-12-03'), [numpy.datetime64('2016-12-03T00:00:00.000000000'), numpy.datetime64('2016-12-03T02:00:00.000000000'), numpy.datetime64('2016-12-03T04:00:00.000000000'), numpy.datetime64('2016-12-03T06:00:00.000000000'), numpy.datetime64('2016-12-03T08:00:00.000000000'), numpy.datetime64('2016-12-03T10:00:00.000000000'), numpy.datetime64('2016-12-03T12:00:00.000000000'), numpy.datetime64('2016-12-03T14:00:00.000000000'), numpy.datetime64('2016-12-03T16:00:00.000000000'), numpy.datetime64('2016-12-03T18:00:00.000000000'), numpy.datetime64('2016-12-03T20:00:00.000000000'), numpy.datetime64('2016-12-03T22:00:00.000000000')])
因此,我们每天都选择打印相应timestamps
的列表,但这当然是一个选项。与示例显示的不同,并非所有切片都具有相同的长度(最后两个切片具有额外的元素)
请注意timestamps
是一个迭代器,因此如果你没有将它转换为一个列表就会耗尽,然后在一个循环之后,迭代器耗尽。
groupby
以线性时间工作,因为每次检查"组密钥"与前一个元素相同,但如前所述,必须对数据进行排序。
答案 1 :(得分:1)
您可以将collections.defaultdict
用于O(n)解决方案。您可以使用Pandas来规范化datetime
个对象,尽管这也可以通过NumPy实现。
import pandas as pd
from collections import defaultdict
d = defaultdict(list)
for item in L:
day = pd.to_datetime(item).normalize().to_datetime64()
d[day].append(item)
print(d)
defaultdict(list,
{numpy.datetime64('2016-12-01T00:00:00.000000000'):
[numpy.datetime64('2016-12-01T02:00:00.000000000'),
...
numpy.datetime64('2016-12-01T22:00:00.000000000')],
numpy.datetime64('2016-12-02T00:00:00.000000000'):
[numpy.datetime64('2016-12-02T00:00:00.000000000'),
...
numpy.datetime64('2016-12-02T22:00:00.000000000')],
numpy.datetime64('2016-12-03T00:00:00.000000000'):
[numpy.datetime64('2016-12-03T00:00:00.000000000'),
...
numpy.datetime64('2016-12-03T22:00:00.000000000')]})