我有一个878000 * 1的数据框,其中1列是几年内的日期。我有以下代码来创建新列并在不同的新列中存储年,月,日,小时,分钟,星期:
for i in train.index:
train['Year'][i] = train.Dates[i].year
train['Month'][i] = train.Dates[i].month
train['Day'][i] = train.Dates[i].day
train['Hour'][i] = train.Dates[i].hour
train['Min'][i] = train.Dates[i].minute
train['Week'][i] = train.Dates[i].isocalendar()[1]
然而这真的很慢,我的电脑一直在这个简单的命令一夜之间工作,但仍然没有完成。我想知道是否有一些更快的方法可以用来提取这些信息而无需等待这么长时间?
答案 0 :(得分:2)
In [15]: train = pd.DataFrame(pd.date_range('2015-12-31', '2016-12-31'), columns=['Dates'])
In [16]: train.head()
Out[16]:
Dates
0 2015-12-31
1 2016-01-01
2 2016-01-02
3 2016-01-03
4 2016-01-04
In [17]: fields = ['Year', 'Month', 'Day', 'Hour', 'Min', 'Week']
In [18]: f = lambda x: pd.Series([x[0].year, x[0].month,
x[0].day, x[0].hour,
x[0].minute, x[0].isocalendar()[1]],
index=fields)
In [19]: train.apply(f, axis=1)
Out[19]:
Year Month Day Hour Min Week
0 2015 12 31 0 0 53
1 2016 1 1 0 0 53
2 2016 1 2 0 0 53
3 2016 1 3 0 0 53
4 2016 1 4 0 0 1
5 2016 1 5 0 0 1
6 2016 1 6 0 0 1
7 2016 1 7 0 0 1
8 2016 1 8 0 0 1
9 2016 1 9 0 0 1
10 2016 1 10 0 0 1
11 2016 1 11 0 0 2
12 2016 1 12 0 0 2
13 2016 1 13 0 0 2
14 2016 1 14 0 0 2
15 2016 1 15 0 0 2
16 2016 1 16 0 0 2
17 2016 1 17 0 0 2
18 2016 1 18 0 0 3
19 2016 1 19 0 0 3
20 2016 1 20 0 0 3
21 2016 1 21 0 0 3
22 2016 1 22 0 0 3
23 2016 1 23 0 0 3
24 2016 1 24 0 0 3
25 2016 1 25 0 0 4
26 2016 1 26 0 0 4
27 2016 1 27 0 0 4
28 2016 1 28 0 0 4
29 2016 1 29 0 0 4
.. ... ... ... ... ... ...
337 2016 12 2 0 0 48
338 2016 12 3 0 0 48
339 2016 12 4 0 0 48
340 2016 12 5 0 0 49
341 2016 12 6 0 0 49
342 2016 12 7 0 0 49
343 2016 12 8 0 0 49
344 2016 12 9 0 0 49
345 2016 12 10 0 0 49
346 2016 12 11 0 0 49
347 2016 12 12 0 0 50
348 2016 12 13 0 0 50
349 2016 12 14 0 0 50
350 2016 12 15 0 0 50
351 2016 12 16 0 0 50
352 2016 12 17 0 0 50
353 2016 12 18 0 0 50
354 2016 12 19 0 0 51
355 2016 12 20 0 0 51
356 2016 12 21 0 0 51
357 2016 12 22 0 0 51
358 2016 12 23 0 0 51
359 2016 12 24 0 0 51
360 2016 12 25 0 0 51
361 2016 12 26 0 0 52
362 2016 12 27 0 0 52
363 2016 12 28 0 0 52
364 2016 12 29 0 0 52
365 2016 12 30 0 0 52
366 2016 12 31 0 0 52
答案 1 :(得分:1)
首先,你不想循环使用它,你应该使用矢量化数据:
train['Year'] = train.Dates.dt.year
train['Month'] = train.Dates.dt.month
...