我在python中编写了一个生成器,它在每次调用pandas DataFrame时都会产生新的数据。我的DataFrame是索引的unix时间戳。我对代码的第一次尝试如下(df是数据帧,tz是pytz.timezone(在我的情况下是欧洲/阿姆斯特丹):
def interval_generator(df, tz):
today = datetime.datetime.fromtimestamp(df.index.min(), tz)
last_day = datetime.datetime.fromtimestamp(df.index.max(), tz)
while today <= last_day:
tomorrow = today + datetime.timedelta(days=1)
yield df.loc[tz.localize(today).timestamp():tz.localize(tomorrow).timestamp() - 1]
today = tomorrow
然而,在运行我的代码时,我注意到DateTime对象有一种奇怪的行为,即它最初附加的时区(特别是递增的小时)。 (在我看来)怪异行为的例子:
import datetime
import pytz
tz = pytz.timezone('Europe/Amsterdam')
# This is when daylight saving times stops in the Netherlands in 2015.
t1 = datetime.datetime(2015, 10, 25, 0, 0)
t2 = t1 + datetime.timedelta(days=1)
t1_localized = tz.localize(t1)
t2_localized = tz.localize(t2)
t2_loc_incremented = t1_localized + datetime.timedelta(days=1)
当打印这三个最终变量的输出时,您会得到:
>>> t1_localized
datetime.datetime(2015, 10, 25, 0, 0, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>)
>>> t2_localized
datetime.datetime(2015, 10, 26, 0, 0, tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)
>>> t2_loc_incremented
datetime.datetime(2015, 10, 26, 0, 0, tzinfo=<DstTzInfo 'Europe/Amsterdam' CEST+2:00:00 DST>)
更重要的是,对于我的代码,两个版本的t2的时间戳是不同的:
>>> t2_localized.timestamp()
1445814000.0
>>> t2_loc_incremented.timestamp()
1445810400.0
我在生成器函数中使用以下解决方法解决了这个问题:
def interval_generator(df, tz):
today = datetime.datetime.fromtimestamp(df.index.min(), tz=tz).strftime('%Y-%m-%d')
today = datetime.datetime.strptime(today, '%Y-%m-%d')
last_day = datetime.datetime.fromtimestamp(df.index.max(), tz=tz).strftime('%Y-%m-%d')
last_day = datetime.datetime.strptime(last_day, '%Y-%m-%d')
while today <= last_day:
tomorrow = today + datetime.timedelta(days=1)
yield df.loc[tz.localize(today).timestamp():tz.localize(tomorrow).timestamp() - 1]
today = tomorrow
这基本上让我获得了所需的功能,但我不知道是否有更好的方法来处理这个问题。我的问题有什么好的选择吗?这被认为是日期时间模块的错误吗? (我正在使用python 3.4)我尝试使用谷歌搜索,但找不到任何东西