Python / Pandas:加速计算datetime-data的工作日

时间:2014-06-10 13:42:52

标签: python datetime pandas date-conversion weekday

我在pandas中加载数据,而 date 列包含日期时间值,例如:

   date    ; .....more stuff ......
2000-01-03 ; 
2000-01-04 ;
2000-01-06 ;
...
2000-01-31 ;
2000-02-01 ;
2000-02-02 ;
2000-02-04 ;

我有一个函数来添加一个包含工作日索引(0-6)的列:

def genWeekdays(df,src='date',target='weekday'):
    """
    bla bla bla
    """
    df[target] = df[src].apply(lambda x: x.weekday())
    return df

通过

调用它
df = genWeekdays(df)

df有大约一百万行,大约需要1.3秒。 有什么方法可以加快速度吗?我对i7-4770k的持续时间有点惊讶:( 提前致谢

1 个答案:

答案 0 :(得分:4)

In [30]: df = DataFrame(dict(date = pd.date_range('20000101',periods=100000,freq='s'), value = np.random.randn(100000)))

In [31]: df['weekday'] = pd.DatetimeIndex(df['date']).weekday

In [32]: %timeit pd.DatetimeIndex(df['date']).weekday
10 loops, best of 3: 34.9 ms per loop

In [33]: df
Out[33]: 
                     date     value

In [33]: df
Out[33]: 
                     date     value  weekday
0     2000-01-01 00:00:00 -0.046604        5
1     2000-01-01 00:00:01 -1.691611        5
2     2000-01-01 00:00:02  0.416015        5
3     2000-01-01 00:00:03  0.054822        5
4     2000-01-01 00:00:04 -0.661163        5
5     2000-01-01 00:00:05  0.274402        5
6     2000-01-01 00:00:06 -0.426533        5
7     2000-01-01 00:00:07  0.028769        5
8     2000-01-01 00:00:08  0.248581        5
9     2000-01-01 00:00:09  1.302145        5
10    2000-01-01 00:00:10 -1.886830        5
11    2000-01-01 00:00:11  2.276506        5
12    2000-01-01 00:00:12  0.054104        5
13    2000-01-01 00:00:13  0.378990        5
14    2000-01-01 00:00:14  0.868879        5
15    2000-01-01 00:00:15 -0.046810        5
16    2000-01-01 00:00:16 -0.499447        5
17    2000-01-01 00:00:17  1.067412        5
18    2000-01-01 00:00:18 -1.625986        5
19    2000-01-01 00:00:19  0.515884        5
20    2000-01-01 00:00:20 -1.884882        5
21    2000-01-01 00:00:21  0.943775        5
22    2000-01-01 00:00:22  0.034501        5
23    2000-01-01 00:00:23  0.438170        5
24    2000-01-01 00:00:24 -1.211937        5
25    2000-01-01 00:00:25 -0.229930        5
26    2000-01-01 00:00:26  0.938805        5
27    2000-01-01 00:00:27  0.026815        5
28    2000-01-01 00:00:28  2.166740        5
29    2000-01-01 00:00:29 -0.096927        5
...                   ...       ...      ...
99970 2000-01-02 03:46:10 -0.310023        6
99971 2000-01-02 03:46:11  0.561321        6
99972 2000-01-02 03:46:12  2.207426        6
99973 2000-01-02 03:46:13 -0.253933        6
99974 2000-01-02 03:46:14 -0.711145        6
99975 2000-01-02 03:46:15 -0.477377        6
99976 2000-01-02 03:46:16  1.492970        6
99977 2000-01-02 03:46:17  0.308510        6
99978 2000-01-02 03:46:18  0.126579        6
99979 2000-01-02 03:46:19 -1.704093        6
99980 2000-01-02 03:46:20 -0.328285        6
99981 2000-01-02 03:46:21  1.685411        6
99982 2000-01-02 03:46:22 -0.368899        6
99983 2000-01-02 03:46:23  0.915786        6
99984 2000-01-02 03:46:24 -1.694855        6
99985 2000-01-02 03:46:25 -1.488130        6
99986 2000-01-02 03:46:26 -1.274004        6
99987 2000-01-02 03:46:27 -1.508376        6
99988 2000-01-02 03:46:28  0.551695        6
99989 2000-01-02 03:46:29  0.007957        6
99990 2000-01-02 03:46:30 -0.214852        6
99991 2000-01-02 03:46:31 -1.390088        6
99992 2000-01-02 03:46:32 -0.472137        6
99993 2000-01-02 03:46:33 -0.969515        6
99994 2000-01-02 03:46:34  1.129802        6
99995 2000-01-02 03:46:35 -0.291428        6
99996 2000-01-02 03:46:36  0.337134        6
99997 2000-01-02 03:46:37  0.989259        6
99998 2000-01-02 03:46:38  0.705592        6
99999 2000-01-02 03:46:39 -0.311884        6

[100000 rows x 3 columns]