我有一个数据框,我想在另外两列之间添加一个包含时差的列:
df[Diff] = df['End Time'] - df['Open Time']
df[Diff]
0 0 days 01:25:40
1 0 days 00:41:57
2 0 days 00:21:47
3 0 days 16:41:57
4 0 days 04:32:00
5 0 days 03:01:57
6 0 days 01:37:56
7 0 days 01:13:57
8 0 days 01:07:56
9 0 days 02:33:59
10 29 days 18:33:53
11 0 days 03:50:56
12 0 days 01:57:56
我希望这个专栏的格式为'1h 25m',所以我试图计算几小时的天数:
diff = df['End Time'] - df['Open Time']
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
并收到了这些结果:
0 1
1 0
2 0
3 16
4 4
5 3
6 1
7 1
8 1
9 2
10 714
11 3
12 1
dtype: int64h 0 25
1 41
2 21
3 41
4 32
5 1
6 37
7 13
8 7
9 33
10 33
11 50
12 57
Name: minutes, dtype: int64m
但我无法在新专栏中以这种格式表达这些结果:
'{}h {}m'.format(hours,minutes))
答案 0 :(得分:2)
您可以提取相关列,使用str
转换为astype
,然后根据需要将cols连接起来。
c = (df['End Time'] - df['Open Time'])\
.dt.components[['days', 'hours', 'minutes']]
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
df['diff']
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
Name: diff, dtype: object
答案 1 :(得分:1)
您可以使用total_seconds
将timedelta
转换为秒,然后计算hours
,minutes
以及秒数,这比使用dt.components
快10倍:
s = diff.dt.total_seconds().astype(int)
hours = s // 3600
# remaining seconds
s = s - (hours * 3600)
# minutes
minutes = s // 60
# remaining seconds
seconds = s - (minutes * 60)
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25
1 0h 41
2 0h 21
3 16h 41
4 4h 32
5 3h 1
6 1h 37
7 1h 13
8 1h 7
9 2h 33
10 714h 33
11 3h 50
12 1h 57
Name: Diff, dtype: object
Zero comment解决方案:
hours = diff.dt.days * 24 + diff.dt.components.hours
minutes = diff.dt.components.minutes
a = hours.astype(str) + 'h ' + minutes.astype(str)
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 18h 33m
11 3h 50m
12 1h 57m
dtype: object
另:
a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
print (a)
0 1h 25m
1 0h 41m
2 0h 21m
3 16h 41m
4 4h 32m
5 3h 1m
6 1h 37m
7 1h 13m
8 1h 7m
9 2h 33m
10 714h 33m
11 3h 50m
12 1h 57m
dtype: object
<强>计时强>:
#13000 rows
df = pd.concat([df]*1000).reset_index(drop=True)
In [191]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
1 loop, best of 3: 483 ms per loop
In [192]: %%timeit
...: s = diff.dt.total_seconds().astype(int)
...:
...: hours = s // 3600
...: # remaining seconds
...: s = s - (hours * 3600)
...: # minutes
...: minutes = s // 60
...: # remaining seconds
...: seconds = s - (minutes * 60)
...:
...: a = hours.astype(str) + 'h ' + minutes.astype(str)
...:
10 loops, best of 3: 43.9 ms per loop
In [193]: %%timeit
...: hours = diff.dt.days * 24 + diff.dt.components.hours
...: minutes = diff.dt.components.minutes
...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
...:
1 loop, best of 3: 465 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ solution
In [194]: %%timeit
...: c = diff.dt.components[['days', 'hours', 'minutes']]
...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
...:
1 loop, best of 3: 208 ms per loop