python datetime提取小时分钟快

时间:2016-04-21 13:33:56

标签: python pandas

我有一个878000 * 1的数据框,其中1列是几年内的日期。我有以下代码来创建新列并在不同的新列中存储年,月,日,小时,分钟,星期:

for i in train.index:
    train['Year'][i] = train.Dates[i].year
    train['Month'][i] = train.Dates[i].month
    train['Day'][i] = train.Dates[i].day
    train['Hour'][i] = train.Dates[i].hour
    train['Min'][i] = train.Dates[i].minute
    train['Week'][i] = train.Dates[i].isocalendar()[1]

然而这真的很慢,我的电脑一直在这个简单的命令一夜之间工作,但仍然没有完成。我想知道是否有一些更快的方法可以用来提取这些信息而无需等待这么长时间?

2 个答案:

答案 0 :(得分:2)

设置

In [15]: train = pd.DataFrame(pd.date_range('2015-12-31', '2016-12-31'), columns=['Dates'])

In [16]: train.head()
Out[16]:
       Dates
0 2015-12-31
1 2016-01-01
2 2016-01-02
3 2016-01-03
4 2016-01-04

解决方案

In [17]: fields = ['Year', 'Month', 'Day', 'Hour', 'Min', 'Week']

In [18]: f = lambda x: pd.Series([x[0].year, x[0].month,
                                  x[0].day, x[0].hour,
                                  x[0].minute, x[0].isocalendar()[1]],
                                 index=fields)


In [19]: train.apply(f, axis=1)

看起来像

Out[19]:
     Year  Month  Day  Hour  Min  Week
0    2015     12   31     0    0    53
1    2016      1    1     0    0    53
2    2016      1    2     0    0    53
3    2016      1    3     0    0    53
4    2016      1    4     0    0     1
5    2016      1    5     0    0     1
6    2016      1    6     0    0     1
7    2016      1    7     0    0     1
8    2016      1    8     0    0     1
9    2016      1    9     0    0     1
10   2016      1   10     0    0     1
11   2016      1   11     0    0     2
12   2016      1   12     0    0     2
13   2016      1   13     0    0     2
14   2016      1   14     0    0     2
15   2016      1   15     0    0     2
16   2016      1   16     0    0     2
17   2016      1   17     0    0     2
18   2016      1   18     0    0     3
19   2016      1   19     0    0     3
20   2016      1   20     0    0     3
21   2016      1   21     0    0     3
22   2016      1   22     0    0     3
23   2016      1   23     0    0     3
24   2016      1   24     0    0     3
25   2016      1   25     0    0     4
26   2016      1   26     0    0     4
27   2016      1   27     0    0     4
28   2016      1   28     0    0     4
29   2016      1   29     0    0     4
..    ...    ...  ...   ...  ...   ...
337  2016     12    2     0    0    48
338  2016     12    3     0    0    48
339  2016     12    4     0    0    48
340  2016     12    5     0    0    49
341  2016     12    6     0    0    49
342  2016     12    7     0    0    49
343  2016     12    8     0    0    49
344  2016     12    9     0    0    49
345  2016     12   10     0    0    49
346  2016     12   11     0    0    49
347  2016     12   12     0    0    50
348  2016     12   13     0    0    50
349  2016     12   14     0    0    50
350  2016     12   15     0    0    50
351  2016     12   16     0    0    50
352  2016     12   17     0    0    50
353  2016     12   18     0    0    50
354  2016     12   19     0    0    51
355  2016     12   20     0    0    51
356  2016     12   21     0    0    51
357  2016     12   22     0    0    51
358  2016     12   23     0    0    51
359  2016     12   24     0    0    51
360  2016     12   25     0    0    51
361  2016     12   26     0    0    52
362  2016     12   27     0    0    52
363  2016     12   28     0    0    52
364  2016     12   29     0    0    52
365  2016     12   30     0    0    52
366  2016     12   31     0    0    52

答案 1 :(得分:1)

首先,你不想循环使用它,你应该使用矢量化数据:

train['Year'] = train.Dates.dt.year
train['Month'] = train.Dates.dt.month

...