我的数据框包含一个具有日期(StartTime)的列,格式如下: 28-7-2015 0:09:00 同一数据框还包含一个包含该数字的列秒(SetupDuration1)。
我想创建一个新列,从日期字段中减去秒数,
dftask['Start'] = dftask['StartTime'] - dftask['SetupDuration1']
SetupDuration1列是一个数字列,必须保留一个数字列,因为我对此列执行不同的操作,取绝对值等。
那么我应该如何以正确的方式减去秒数。 ?
答案 0 :(得分:1)
apply
一个lambda转换为timedelta然后减去:
In [88]:
df = pd.DataFrame({'StartTime':pd.date_range(start=dt.datetime(2015,1,1), end = dt.datetime(2015,2,1)), 'SetupDuration1':np.random.randint(0, 59, size=32)})
df
Out[88]:
SetupDuration1 StartTime
0 14 2015-01-01
1 55 2015-01-02
2 21 2015-01-03
3 50 2015-01-04
4 21 2015-01-05
5 6 2015-01-06
6 6 2015-01-07
7 2 2015-01-08
8 10 2015-01-09
9 3 2015-01-10
10 11 2015-01-11
11 32 2015-01-12
12 53 2015-01-13
13 45 2015-01-14
14 48 2015-01-15
15 23 2015-01-16
16 7 2015-01-17
17 5 2015-01-18
18 18 2015-01-19
19 26 2015-01-20
20 48 2015-01-21
21 8 2015-01-22
22 58 2015-01-23
23 24 2015-01-24
24 47 2015-01-25
25 10 2015-01-26
26 32 2015-01-27
27 26 2015-01-28
28 36 2015-01-29
29 36 2015-01-30
30 40 2015-01-31
31 18 2015-02-01
In [94]:
df['Start'] = df['StartTime'] - df['SetupDuration1'].apply(lambda x: pd.Timedelta(x, 's'))
df
Out[94]:
SetupDuration1 StartTime Start
0 14 2015-01-01 2014-12-31 23:59:46
1 55 2015-01-02 2015-01-01 23:59:05
2 21 2015-01-03 2015-01-02 23:59:39
3 50 2015-01-04 2015-01-03 23:59:10
4 21 2015-01-05 2015-01-04 23:59:39
5 6 2015-01-06 2015-01-05 23:59:54
6 6 2015-01-07 2015-01-06 23:59:54
7 2 2015-01-08 2015-01-07 23:59:58
8 10 2015-01-09 2015-01-08 23:59:50
9 3 2015-01-10 2015-01-09 23:59:57
10 11 2015-01-11 2015-01-10 23:59:49
11 32 2015-01-12 2015-01-11 23:59:28
12 53 2015-01-13 2015-01-12 23:59:07
13 45 2015-01-14 2015-01-13 23:59:15
14 48 2015-01-15 2015-01-14 23:59:12
15 23 2015-01-16 2015-01-15 23:59:37
16 7 2015-01-17 2015-01-16 23:59:53
17 5 2015-01-18 2015-01-17 23:59:55
18 18 2015-01-19 2015-01-18 23:59:42
19 26 2015-01-20 2015-01-19 23:59:34
20 48 2015-01-21 2015-01-20 23:59:12
21 8 2015-01-22 2015-01-21 23:59:52
22 58 2015-01-23 2015-01-22 23:59:02
23 24 2015-01-24 2015-01-23 23:59:36
24 47 2015-01-25 2015-01-24 23:59:13
25 10 2015-01-26 2015-01-25 23:59:50
26 32 2015-01-27 2015-01-26 23:59:28
27 26 2015-01-28 2015-01-27 23:59:34
28 36 2015-01-29 2015-01-28 23:59:24
29 36 2015-01-30 2015-01-29 23:59:24
30 40 2015-01-31 2015-01-30 23:59:20
31 18 2015-02-01 2015-01-31 23:59:42
<强>计时强>
实际上,在现场构建Timedeltaindex似乎更快:
In [99]:
%timeit df['Start'] = df['StartTime'] - pd.TimedeltaIndex(df['SetupDuration1'], unit='s')
1000 loops, best of 3: 837 µs per loop
In [100]:
%timeit df['Start'] = df['StartTime'] - df['SetupDuration1'].apply(lambda x: pd.Timedelta(x, 's'))
100 loops, best of 3: 1.97 ms per loop
所以我只是这样做:
In [101]:
df['Start'] = df['StartTime'] - pd.TimedeltaIndex(df['SetupDuration1'], unit='s')
df
Out[101]:
SetupDuration1 StartTime Start
0 14 2015-01-01 2014-12-31 23:59:46
1 55 2015-01-02 2015-01-01 23:59:05
2 21 2015-01-03 2015-01-02 23:59:39
3 50 2015-01-04 2015-01-03 23:59:10
4 21 2015-01-05 2015-01-04 23:59:39
5 6 2015-01-06 2015-01-05 23:59:54
6 6 2015-01-07 2015-01-06 23:59:54
7 2 2015-01-08 2015-01-07 23:59:58
8 10 2015-01-09 2015-01-08 23:59:50
9 3 2015-01-10 2015-01-09 23:59:57
10 11 2015-01-11 2015-01-10 23:59:49
11 32 2015-01-12 2015-01-11 23:59:28
12 53 2015-01-13 2015-01-12 23:59:07
13 45 2015-01-14 2015-01-13 23:59:15
14 48 2015-01-15 2015-01-14 23:59:12
15 23 2015-01-16 2015-01-15 23:59:37
16 7 2015-01-17 2015-01-16 23:59:53
17 5 2015-01-18 2015-01-17 23:59:55
18 18 2015-01-19 2015-01-18 23:59:42
19 26 2015-01-20 2015-01-19 23:59:34
20 48 2015-01-21 2015-01-20 23:59:12
21 8 2015-01-22 2015-01-21 23:59:52
22 58 2015-01-23 2015-01-22 23:59:02
23 24 2015-01-24 2015-01-23 23:59:36
24 47 2015-01-25 2015-01-24 23:59:13
25 10 2015-01-26 2015-01-25 23:59:50
26 32 2015-01-27 2015-01-26 23:59:28
27 26 2015-01-28 2015-01-27 23:59:34
28 36 2015-01-29 2015-01-28 23:59:24
29 36 2015-01-30 2015-01-29 23:59:24
30 40 2015-01-31 2015-01-30 23:59:20
31 18 2015-02-01 2015-01-31 23:59:42