Pandas:将Datetime对象分配给时间间隔

时间:2015-10-08 11:06:58

标签: python pandas

我尝试创建一个新变量,其中datetime64[ns]个对象被分配为5分钟的间隔。新的区间变量应跨越从00:00到23:55的每5分钟一段时间。赋值的标准是datetime64[ns]对象的时间是否落在相应的5分钟间隔内。我的实际数据在DateTime变量中有很多日期,但不应考虑这些不同的日期 - 只有时间元素对分配很重要。

我在下面模拟了这个。这个例子集中在大约23:30到23:45的时间段,但它应该例证我在00:00到23:55之间所有间隔所要达到的目标。我已经包含了两个随机日期来说明日期不应该有任何影响。

DateTime
2009-02-18 23:32:29 - would map to interval 23:30
2009-02-18 23:34:41 - would map to interval 23:30
2009-02-18 23:35:40 - would map to interval 23.35
2009-02-18 23:39:29 - would map to interval 23:35
2009-02-18 23:39:37 - would map to interval 23:35
2009-02-18 23:40:14 - would map to interval 23:40
2009-02-18 23:43:23 - would map to interval 23:40
2009-02-18 23:44:17 - would map to interval 23:40
...
2010-03-18 23:31:19 - also maps to interval 23:30 regardless of date
2010-03-18 23:33:31 - also maps to interval 23:30 regardless of date
2010-03-18 23:36:30 - also maps to interval 23.35 regardless of date
2010-03-18 23:38:21 - also maps to interval 23:35 regardless of date
2010-03-18 23:39:07 - also maps to interval 23:35 regardless of date
2010-03-18 23:41:44 - also maps to interval 23:40 regardless of date
2010-03-18 23:42:13 - also maps to interval 23:40 regardless of date
2010-03-18 23:43:37 - also maps to interval 23:40 regardless of date

为了清楚起见,我瞄准了这个结果:

DateTime             Interval 
2009-02-18 23:32:29  23:30
2009-02-18 23:34:41  23:30
2009-02-18 23:35:40  23.35
2009-02-18 23:39:29  23:35
2009-02-18 23:39:37  23:35
2009-02-18 23:40:14  23:40
2009-02-18 23:43:23  23:40
2009-02-18 23:44:17  23:40
...
2010-03-18 23:31:19  23:30
2010-03-18 23:33:31  23:30
2010-03-18 23:36:30  23.35
2010-03-18 23:38:21  23:35
2010-03-18 23:39:07  23:35
2010-03-18 23:41:44  23:40
2010-03-18 23:42:13  23:40
2010-03-18 23:43:37  23:40

我已经仔细阅读了Pandas文档,并且在这里提出了一些非常松散的相关问题,但我似乎无法获得任何可以获得正确结果的信息。

更新

这些是我的库和系统版本:

Pandas: 0.16.2
Numpy: 1.9.2
System version: '3.4.3 |Anaconda 2.3.0 (x86_64)| (default, Mar  6 2015, 12:07:41) \n[GCC 4.2.1 (Apple Inc. build 5577)]

这是完整的错误。在这里,您可以看到,根据我的实际数据,我正在使用名为datetime64[ns]的{​​{1}}系列。

question_time

问题似乎与TypeError Traceback (most recent call last) <ipython-input-416-d5c3256e6b40> in <module>() ----> 1 df_unique['Interval'] = ((df_unique['question_time'] - pd.TimedeltaIndex(df_unique['question_time'].dt.minute % 5, 'm')) - pd.TimedeltaIndex(df_unique['question_time'].dt.second , 's')).dt.time //anaconda/lib/python3.4/site-packages/pandas/core/frame.py in __setitem__(self, key, value) 2125 else: 2126 # set column -> 2127 self._set_item(key, value) 2128 2129 def _setitem_slice(self, key, value): //anaconda/lib/python3.4/site-packages/pandas/core/frame.py in _set_item(self, key, value) 2209 # value exeption to occur first 2210 if len(self): -> 2211 self._check_setitem_copy() 2212 2213 def insert(self, loc, column, value, allow_duplicates=False): //anaconda/lib/python3.4/site-packages/pandas/core/generic.py in _check_setitem_copy(self, stacklevel, t, force) 1302 raise SettingWithCopyError(t) 1303 elif value == 'warn': -> 1304 warnings.warn(t, SettingWithCopyWarning, stacklevel=stacklevel) 1305 1306 def __delitem__(self, key): TypeError: issubclass() arg 2 must be a class or tuple of classes 有关。我尝试重置所有变量,现在我也得到了另一个操作的同样警告。

1 个答案:

答案 0 :(得分:1)

不确定一个更好的方法,但你可以构造2个TimeDeltaIndices并从你的值中减去它,我使用模数op %来计算要减去的分钟数:

In [129]:
df['Interval'] = ((df['DateTime'] - pd.TimedeltaIndex(df['DateTime'].dt.minute % 5, 'm')) - pd.TimedeltaIndex(df['DateTime'].dt.second , 's')).dt.time
df

Out[129]:
              DateTime  Interval
0  2009-02-18 23:32:29  23:30:00
1  2009-02-18 23:34:41  23:30:00
2  2009-02-18 23:35:40  23:35:00
3  2009-02-18 23:39:29  23:35:00
4  2009-02-18 23:39:37  23:35:00
5  2009-02-18 23:40:14  23:40:00
6  2009-02-18 23:43:23  23:40:00
7  2009-02-18 23:44:17  23:40:00
8  2010-03-18 23:31:19  23:30:00
9  2010-03-18 23:33:31  23:30:00
10 2010-03-18 23:36:30  23:35:00
11 2010-03-18 23:38:21  23:35:00
12 2010-03-18 23:39:07  23:35:00
13 2010-03-18 23:41:44  23:40:00
14 2010-03-18 23:42:13  23:40:00
15 2010-03-18 23:43:37  23:40:00