我有一只熊猫日期时间系列:
>>> desc1.startTime.head()
0 2008-10-18 12:08:49
1 2008-10-18 12:22:52
2 2008-10-18 12:40:26
3 2008-10-18 12:57:52
4 2008-10-18 13:15:17
Name: startTime, dtype: datetime64[ns]
我想要类似的东西
t(0)=0
和t(i) = desc1.startTime(i) - desc1.startTime(0)
用熊猫有一个很好的方法吗?
编辑: 这就是我尝试过的。它无法正常工作
>>> desc1.head()
Wafer_Slot Summary_GroupName startTime LotNum time
0 1 1 2008-10-18 12:08:49 Q3968075 00:00:00
1 5 1 2008-10-18 12:22:52 Q3968075 00:14:03
2 10 1 2008-10-18 12:40:26 Q3968075 00:31:37
3 15 1 2008-10-18 12:57:52 Q3968075 00:49:03
4 20 1 2008-10-18 13:15:17 Q3968075 01:06:28
>>> desc1['time'].head()
0 00:00:00
1 00:14:03
2 00:31:37
3 00:49:03
4 01:06:28
Name: time, dtype: timedelta64[ns]
>>> desc1['time'].apply(lambda x:x.seconds)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/donbeo/MyApps/phd_python/lib/python3.4/site-packages/pandas/core/series.py", line 2169, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/src/inference.pyx", line 1059, in pandas.lib.map_infer (pandas/lib.c:62578)
File "<stdin>", line 1, in <lambda>
AttributeError: 'numpy.timedelta64' object has no attribute 'seconds'
>>>
答案 0 :(得分:0)
您可以tolist()
与dt
和iloc
使用Edchum's comments,因为docs说:
print df['startTime']
#0 2008-10-18 12:08:49
#1 2008-10-18 12:22:52
#2 2008-10-18 12:40:26
#3 2008-10-18 12:57:52
#4 2008-10-18 13:15:17
#Name: startTime, dtype: datetime64[ns]
print (df['startTime'] - df['startTime'].iloc[0]).dt.seconds.tolist()
#[0, 843, 1897, 2943, 3988]
或者您可以使用timedelta
转换为seconds
astype
:
print ((df['startTime']-df['startTime'].iloc[0])/np.timedelta64(1, 's')).astype(int).tolist()
#[0, 843, 1897, 2943, 3988]
print (df['startTime']-df['startTime'].iloc[0]).astype('timedelta64[s]').astype(int).tolist()
#[0, 843, 1897, 2943, 3988]
如果您需要float
,则可以省略http://s16.postimg.org/cv5ejchqt/exig.jpg:
print ((df['startTime'] - df['startTime'].iloc[0]) / np.timedelta64(1, 's')).tolist()
#[0.0, 843.0, 1897.0, 2943.0, 3988.0]
您可以使用microseconds
:
print (df.startTime - df.startTime.iloc[0]).astype('timedelta64[ms]').astype(int).tolist()
#[0, 843000, 1897000, 2943000, 3988000]
<强>时序强>:
这些时间将取决于s的大小以及值的数量(和位置):
In [54]: %timeit (df['startTime'] - df['startTime'].iloc[0]).dt.seconds.tolist()
The slowest run took 4.29 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 1.38 ms per loop
In [55]: %timeit ((df['startTime'] - df['startTime'].iloc[0]) / np.timedelta64(1, 's')).astype(int).tolist()
1000 loops, best of 3: 1.82 ms per loop
In [56]: %timeit (df['startTime'] - df['startTime'].iloc[0]).astype('timedelta64[s]').astype(int).tolist()
The slowest run took 4.31 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 1.01 ms per loop