我在pandas中加载数据,而 date 列包含日期时间值,例如:
date ; .....more stuff ......
2000-01-03 ;
2000-01-04 ;
2000-01-06 ;
...
2000-01-31 ;
2000-02-01 ;
2000-02-02 ;
2000-02-04 ;
我有一个函数来添加一个包含工作日索引(0-6)的列:
def genWeekdays(df,src='date',target='weekday'):
"""
bla bla bla
"""
df[target] = df[src].apply(lambda x: x.weekday())
return df
通过
调用它df = genWeekdays(df)
df
有大约一百万行,大约需要1.3秒。
有什么方法可以加快速度吗?我对i7-4770k的持续时间有点惊讶:(
提前致谢
答案 0 :(得分:4)
In [30]: df = DataFrame(dict(date = pd.date_range('20000101',periods=100000,freq='s'), value = np.random.randn(100000)))
In [31]: df['weekday'] = pd.DatetimeIndex(df['date']).weekday
In [32]: %timeit pd.DatetimeIndex(df['date']).weekday
10 loops, best of 3: 34.9 ms per loop
In [33]: df
Out[33]:
date value
In [33]: df
Out[33]:
date value weekday
0 2000-01-01 00:00:00 -0.046604 5
1 2000-01-01 00:00:01 -1.691611 5
2 2000-01-01 00:00:02 0.416015 5
3 2000-01-01 00:00:03 0.054822 5
4 2000-01-01 00:00:04 -0.661163 5
5 2000-01-01 00:00:05 0.274402 5
6 2000-01-01 00:00:06 -0.426533 5
7 2000-01-01 00:00:07 0.028769 5
8 2000-01-01 00:00:08 0.248581 5
9 2000-01-01 00:00:09 1.302145 5
10 2000-01-01 00:00:10 -1.886830 5
11 2000-01-01 00:00:11 2.276506 5
12 2000-01-01 00:00:12 0.054104 5
13 2000-01-01 00:00:13 0.378990 5
14 2000-01-01 00:00:14 0.868879 5
15 2000-01-01 00:00:15 -0.046810 5
16 2000-01-01 00:00:16 -0.499447 5
17 2000-01-01 00:00:17 1.067412 5
18 2000-01-01 00:00:18 -1.625986 5
19 2000-01-01 00:00:19 0.515884 5
20 2000-01-01 00:00:20 -1.884882 5
21 2000-01-01 00:00:21 0.943775 5
22 2000-01-01 00:00:22 0.034501 5
23 2000-01-01 00:00:23 0.438170 5
24 2000-01-01 00:00:24 -1.211937 5
25 2000-01-01 00:00:25 -0.229930 5
26 2000-01-01 00:00:26 0.938805 5
27 2000-01-01 00:00:27 0.026815 5
28 2000-01-01 00:00:28 2.166740 5
29 2000-01-01 00:00:29 -0.096927 5
... ... ... ...
99970 2000-01-02 03:46:10 -0.310023 6
99971 2000-01-02 03:46:11 0.561321 6
99972 2000-01-02 03:46:12 2.207426 6
99973 2000-01-02 03:46:13 -0.253933 6
99974 2000-01-02 03:46:14 -0.711145 6
99975 2000-01-02 03:46:15 -0.477377 6
99976 2000-01-02 03:46:16 1.492970 6
99977 2000-01-02 03:46:17 0.308510 6
99978 2000-01-02 03:46:18 0.126579 6
99979 2000-01-02 03:46:19 -1.704093 6
99980 2000-01-02 03:46:20 -0.328285 6
99981 2000-01-02 03:46:21 1.685411 6
99982 2000-01-02 03:46:22 -0.368899 6
99983 2000-01-02 03:46:23 0.915786 6
99984 2000-01-02 03:46:24 -1.694855 6
99985 2000-01-02 03:46:25 -1.488130 6
99986 2000-01-02 03:46:26 -1.274004 6
99987 2000-01-02 03:46:27 -1.508376 6
99988 2000-01-02 03:46:28 0.551695 6
99989 2000-01-02 03:46:29 0.007957 6
99990 2000-01-02 03:46:30 -0.214852 6
99991 2000-01-02 03:46:31 -1.390088 6
99992 2000-01-02 03:46:32 -0.472137 6
99993 2000-01-02 03:46:33 -0.969515 6
99994 2000-01-02 03:46:34 1.129802 6
99995 2000-01-02 03:46:35 -0.291428 6
99996 2000-01-02 03:46:36 0.337134 6
99997 2000-01-02 03:46:37 0.989259 6
99998 2000-01-02 03:46:38 0.705592 6
99999 2000-01-02 03:46:39 -0.311884 6
[100000 rows x 3 columns]