将字符串转换为时间编号

时间:2019-09-30 15:38:15

标签: python string pandas

我有一个这样的数据框。我需要将字符串“已用时间”从字符串转换为时间数字,以汇总此列。

Date        User                Used time       Used

31-07-19    Member (GTV1)       1Hour 1Minute   5083
31-07-19    Member (NHOKON)     4Minute         333
31-07-19    Member (GA)         42Minute        3500
31-07-19    Member (VIET125)    2Hour 32Minute  12667
31-07-19    Member (LUAN123)    40Minute        3333
31-07-19    Member (MMINH)      2Hour 47Minute  13917
31-07-19    Member (MMINH)      0Minute         0
31-07-19    Member (MMINH)      2Hour 11Minute  10917
31-07-19    Member (MMINH)      53Minute        4417
30-07-19    Member (KHANH00)    41Minute        3417
30-07-19    Member (CHIEN2)     1Hour 40Minute  8333
30-07-19    Member (CHIEN2)     9Minute         750
30-07-19    Guest               0Minute         2000
30-07-19    Member (DUYTQ111)   1Hour 3Minute   5250
29-07-19    Member (TRUNGTQ2)   42Minute        3500
29-07-19    Guest               1Hour 9Minute   7000
29-07-19    Member (MMINH)      2Hour 15Minute  11250

我的代码

T = df['Used time'].str.split(expand=True)
H = pd.to_numeric(T[0].str.replace('Hour',''))
M = round(pd.to_numeric(T[1].str.replace('Minute',''))/60, 1)
df['Used time'] = H + M

但是不适用于只有分钟的行。请帮助我

4 个答案:

答案 0 :(得分:0)

如果缺少小时,可以将0Hour添加到开头。

df['Used time'] = df['Used time'].map(lambda x : '0Hour ' + x if len(x.split()) == 1 else x)

答案 1 :(得分:0)

使用正则表达式查找Hour的数字和Minute的数字,将其全部转换为分钟和sum怎么办?

import re

T = df['Used time']
h = re.findall("(\d+)Hour", T)
m = re.findall("(\d+)Minute", T)

df['Used time'] = ((int(h[0])*60 if h else 0) + (int(m[0]) if m else 0)) # in minutes

答案 2 :(得分:0)

尝试使用rpartition()而不是拆分。

T = df['Used time'].str.rpartition('Hour')

H = pd.to_numeric(T[0]).fillna(0)

M = (pd.to_numeric(T[2].str.strip('Minute'))/60).round(1)

df['Used time'] = H + M

print(df)

        Date                   User  Used time   Used
0   31-07-19          Member (GTV1)        1.0   5083
1   31-07-19        Member (NHOKON)        0.1    333
2   31-07-19            Member (GA)        0.7   3500
3   31-07-19       Member (VIET125)        2.5  12667
4   31-07-19       Member (LUAN123)        0.7   3333
5   31-07-19         Member (MMINH)        2.8  13917
6   31-07-19         Member (MMINH)        0.0      0
7   31-07-19         Member (MMINH)        2.2  10917
8   31-07-19         Member (MMINH)        0.9   4417
9   30-07-19       Member (KHANH00)        0.7   3417
10  30-07-19        Member (CHIEN2)        1.7   8333
11  30-07-19        Member (CHIEN2)        0.2    750
12  30-07-19                  Guest        0.0   2000
13  30-07-19      Member (DUYTQ111)        1.0   5250
14  29-07-19      Member (TRUNGTQ2)        0.7   3500
15  29-07-19                  Guest        1.2   7000
16  29-07-19         Member (MMINH)        2.2  11250

答案 3 :(得分:0)

您可以使用一种正则表达式模式提取HourMinute

pattern = '(?:(?P<Hour>\d+)Hour\s*)?(?P<Minute>\d+)Minute'
new_df = df['Used time'].str.extract(pattern).fillna(0)

输出:

   Hour Minute
0     1      1
1     0      4
2     0     42
3     2     32
4     0     40
5     2     47
6     0      0
7     2     11
8     0     53
9     0     41
10    1     40
11    0      9
12    0      0
13    1      3
14    0     42
15    1      9
16    2     15

有关模式说明,请参见this link。现在,新的使用时间可以计算为:

df['Used time'] = new_df['Hour']*60 + new_df['Minute']