时间和记忆lambda适用

时间:2014-04-01 11:58:04

标签: python pandas

我有一个以下的DataFrame,我想连接两个包含日期的列和包含小时的其他列。 对于33672行数据帧,以下代码需要5秒,这是因为我有1000倍的数据。

有人更有效吗?

>>> tt
               DATE  level_2  VALUE
SCENARIO                           
s0000    2014-02-28        0  36.39
s0000    2014-02-28        1  34.17
s0000    2014-02-28        2  32.95
s0000    2014-02-28        3  32.84
s0000    2014-02-28        4  34.36
s0000    2014-02-28        5  36.32
s0000    2014-02-28        6  39.76
s0000    2014-02-28        7  40.66
s0000    2014-02-28        8  46.21
s0000    2014-02-28        9  47.19
s0000    2014-02-28       10  46.48
s0000    2014-02-28       11  46.84
s0000    2014-02-28       12  46.08
            ...      ...    ...

[33672 rows x 3 columns]

>>> timet = time.time()
>>> tt['DATES'] = tt.apply(lambda row: row['DATE'].replace(hour=row['level_2']), axis=1)
print time.time()-timet
4.76399993896

1 个答案:

答案 0 :(得分:3)

应用仅在无法矢量化时有用。

将工作在> = 0.12(在0.14中你可以使用pd.to_timedelta(df['hour'],unit='h')而不是造型

In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))

In [9]: df.shape
Out[9]: (1000000, 2)

In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop

In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]: 
0   2014-03-01 03:00:00
1   2014-02-28 23:00:00
2   2014-03-01 06:00:00
3   2014-03-01 06:00:00
4   2014-02-28 15:00:00
dtype: datetime64[ns]