我有一个带有列的数据框,df ['Time']包含时间,从0开始,最多20分钟,如下所示:
1:10,10
1:16,32
3:03,04
首先是分钟,秒是秒,第三是毫秒(只有两位数)。
有没有办法使用Pandas自动将该列转换为秒,并且不将该列作为该系列的时间索引?
我已经尝试过以下但它无法正常工作:
pd.to_datetime(df['Time']).convert('s') # AttributeError: 'Series' object has no attribute 'convert'
如果唯一的方法是解析时间只是指出这个问题,我会准备一个正确/详细的答案来解决这个问题,不要浪费你的时间=) 谢谢!
答案 0 :(得分:4)
代码:
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({'Time':['1:10,10', '1:16,32', '3:03,04']})
df['time'] = df.Time.apply(lambda x: datetime.datetime.strptime(x,'%M:%S,%f'))
df['timedelta'] = df.time - datetime.datetime.strptime('00:00,0','%M:%S,%f')
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df
输出:
Time time timedelta secs
0 1:10,10 1900-01-01 00:01:10.100000 00:01:10.100000 70.10
1 1:16,32 1900-01-01 00:01:16.320000 00:01:16.320000 76.32
2 3:03,04 1900-01-01 00:03:03.040000 00:03:03.040000 183.04
如果您还有负时间增量:
import pandas as pd
import numpy as np
import datetime
import re
regex = re.compile(r"(?P<minus>-)?((?P<minutes>\d+):)?(?P<seconds>\d+)(,(?P<centiseconds>\d{2}))?")
def parse_time(time_str):
parts = regex.match(time_str)
if not parts:
return
parts = parts.groupdict()
time_params = {}
for (name, param) in parts.iteritems():
if param and (name != 'minus'):
time_params[name] = int(param)
time_params['milliseconds'] = time_params['centiseconds']*10
del time_params['centiseconds']
return (-1 if parts['minus'] else 1) * datetime.timedelta(**time_params)
df = pd.DataFrame({'Time':['-1:10,10', '1:16,32', '3:03,04']})
df['timedelta'] = df.Time.apply(lambda x: parse_time(x))
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df
输出:
Time timedelta secs
0 -1:10,10 -00:01:10.100000 -70.10
1 1:16,32 00:01:16.320000 76.32
2 3:03,04 00:03:03.040000 183.04