我有一个以下的DataFrame,我想连接两个包含日期的列和包含小时的其他列。 对于33672行数据帧,以下代码需要5秒,这是因为我有1000倍的数据。
有人更有效吗?
>>> tt
DATE level_2 VALUE
SCENARIO
s0000 2014-02-28 0 36.39
s0000 2014-02-28 1 34.17
s0000 2014-02-28 2 32.95
s0000 2014-02-28 3 32.84
s0000 2014-02-28 4 34.36
s0000 2014-02-28 5 36.32
s0000 2014-02-28 6 39.76
s0000 2014-02-28 7 40.66
s0000 2014-02-28 8 46.21
s0000 2014-02-28 9 47.19
s0000 2014-02-28 10 46.48
s0000 2014-02-28 11 46.84
s0000 2014-02-28 12 46.08
... ... ...
[33672 rows x 3 columns]
>>> timet = time.time()
>>> tt['DATES'] = tt.apply(lambda row: row['DATE'].replace(hour=row['level_2']), axis=1)
print time.time()-timet
4.76399993896
答案 0 :(得分:3)
应用仅在无法矢量化时有用。
将工作在> = 0.12(在0.14中你可以使用pd.to_timedelta(df['hour'],unit='h')
而不是造型
In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))
In [9]: df.shape
Out[9]: (1000000, 2)
In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop
In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]:
0 2014-03-01 03:00:00
1 2014-02-28 23:00:00
2 2014-03-01 06:00:00
3 2014-03-01 06:00:00
4 2014-02-28 15:00:00
dtype: datetime64[ns]