在Pandas中以特定格式添加新的timedelta值列

时间:2017-09-22 12:05:30

标签: python pandas datetime dataframe date-formatting

我有一个数据框,我想在另外两列之间添加一个包含时差的列:

 df[Diff] = df['End Time'] - df['Open Time']
 df[Diff]
 0     0 days 01:25:40
 1     0 days 00:41:57
 2     0 days 00:21:47
 3     0 days 16:41:57
 4     0 days 04:32:00
 5     0 days 03:01:57
 6     0 days 01:37:56
 7     0 days 01:13:57
 8     0 days 01:07:56
 9     0 days 02:33:59
 10   29 days 18:33:53
 11    0 days 03:50:56
 12    0 days 01:57:56

我希望这个专栏的格式为'1h 25m',所以我试图计算几小时的天数:

diff = df['End Time'] - df['Open Time']
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes

并收到了这些结果:

0       1
1       0
2       0
3      16
4       4
5       3
6       1
7       1
8       1
9       2
10    714
11      3
12      1
dtype: int64h 0     25
1     41
2     21
3     41
4     32
5      1
6     37
7     13
8      7
9     33
10    33
11    50
12    57
Name: minutes, dtype: int64m

但我无法在新专栏中以这种格式表达这些结果:

 '{}h {}m'.format(hours,minutes)) 

2 个答案:

答案 0 :(得分:2)

您可以提取相关列,使用str转换为astype,然后根据需要将cols连接起来。

c = (df['End Time'] - df['Open Time'])\
              .dt.components[['days', 'hours', 'minutes']]
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
df['diff']
0       1h 25m
1       0h 41m
2       0h 21m
3      16h 41m
4       4h 32m
5        3h 1m
6       1h 37m
7       1h 13m
8        1h 7m
9       2h 33m
10    714h 33m
11      3h 50m
12      1h 57m
Name: diff, dtype: object

答案 1 :(得分:1)

您可以使用total_secondstimedelta转换为秒,然后计算hoursminutes以及秒数,这比使用dt.components快10倍:

s = diff.dt.total_seconds().astype(int)

hours = s // 3600 
# remaining seconds
s = s - (hours * 3600)
# minutes
minutes = s // 60
# remaining seconds
seconds = s - (minutes * 60)

a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0       1h 25
1       0h 41
2       0h 21
3      16h 41
4       4h 32
5        3h 1
6       1h 37
7       1h 13
8        1h 7
9       2h 33
10    714h 33
11      3h 50
12      1h 57
Name: Diff, dtype: object

Zero comment解决方案:

hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes

a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0      1h 25m
1      0h 41m
2      0h 21m
3     16h 41m
4      4h 32m
5       3h 1m
6      1h 37m
7      1h 13m
8       1h 7m
9      2h 33m
10    18h 33m
11     3h 50m
12     1h 57m
dtype: object

另:

a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
print (a)
0       1h 25m
1       0h 41m
2       0h 21m
3      16h 41m
4       4h 32m
5        3h 1m
6       1h 37m
7       1h 13m
8        1h 7m
9       2h 33m
10    714h 33m
11      3h 50m
12      1h 57m
dtype: object

<强>计时

#13000 rows
df = pd.concat([df]*1000).reset_index(drop=True)

In [191]: %%timeit
     ...: hours = diff.dt.days * 24 + diff.dt.components.hours
     ...: minutes = diff.dt.components.minutes
     ...: 
     ...: a = hours.astype(str) + 'h ' + minutes.astype(str)
     ...: 
1 loop, best of 3: 483 ms per loop

In [192]: %%timeit
     ...: s = diff.dt.total_seconds().astype(int)
     ...: 
     ...: hours = s // 3600 
     ...: # remaining seconds
     ...: s = s - (hours * 3600)
     ...: # minutes
     ...: minutes = s // 60
     ...: # remaining seconds
     ...: seconds = s - (minutes * 60)
     ...: 
     ...: a = hours.astype(str) + 'h ' + minutes.astype(str)
     ...: 
10 loops, best of 3: 43.9 ms per loop

In [193]: %%timeit
     ...: hours = diff.dt.days * 24 + diff.dt.components.hours
     ...: minutes = diff.dt.components.minutes
     ...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
     ...: 
1 loop, best of 3: 465 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ solution
In [194]: %%timeit
     ...: c = diff.dt.components[['days', 'hours', 'minutes']]
     ...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
     ...: 
1 loop, best of 3: 208 ms per loop