我有一个这样的数据框。我需要将字符串“已用时间”从字符串转换为时间或数字,以汇总此列。
Date User Used time Used
31-07-19 Member (GTV1) 1Hour 1Minute 5083
31-07-19 Member (NHOKON) 4Minute 333
31-07-19 Member (GA) 42Minute 3500
31-07-19 Member (VIET125) 2Hour 32Minute 12667
31-07-19 Member (LUAN123) 40Minute 3333
31-07-19 Member (MMINH) 2Hour 47Minute 13917
31-07-19 Member (MMINH) 0Minute 0
31-07-19 Member (MMINH) 2Hour 11Minute 10917
31-07-19 Member (MMINH) 53Minute 4417
30-07-19 Member (KHANH00) 41Minute 3417
30-07-19 Member (CHIEN2) 1Hour 40Minute 8333
30-07-19 Member (CHIEN2) 9Minute 750
30-07-19 Guest 0Minute 2000
30-07-19 Member (DUYTQ111) 1Hour 3Minute 5250
29-07-19 Member (TRUNGTQ2) 42Minute 3500
29-07-19 Guest 1Hour 9Minute 7000
29-07-19 Member (MMINH) 2Hour 15Minute 11250
我的代码
T = df['Used time'].str.split(expand=True)
H = pd.to_numeric(T[0].str.replace('Hour',''))
M = round(pd.to_numeric(T[1].str.replace('Minute',''))/60, 1)
df['Used time'] = H + M
但是不适用于只有分钟的行。请帮助我
答案 0 :(得分:0)
如果缺少小时,可以将0Hour
添加到开头。
df['Used time'] = df['Used time'].map(lambda x : '0Hour ' + x if len(x.split()) == 1 else x)
答案 1 :(得分:0)
使用正则表达式查找Hour
的数字和Minute
的数字,将其全部转换为分钟和sum
怎么办?
import re
T = df['Used time']
h = re.findall("(\d+)Hour", T)
m = re.findall("(\d+)Minute", T)
df['Used time'] = ((int(h[0])*60 if h else 0) + (int(m[0]) if m else 0)) # in minutes
答案 2 :(得分:0)
尝试使用rpartition()
而不是拆分。
T = df['Used time'].str.rpartition('Hour')
H = pd.to_numeric(T[0]).fillna(0)
M = (pd.to_numeric(T[2].str.strip('Minute'))/60).round(1)
df['Used time'] = H + M
print(df)
Date User Used time Used
0 31-07-19 Member (GTV1) 1.0 5083
1 31-07-19 Member (NHOKON) 0.1 333
2 31-07-19 Member (GA) 0.7 3500
3 31-07-19 Member (VIET125) 2.5 12667
4 31-07-19 Member (LUAN123) 0.7 3333
5 31-07-19 Member (MMINH) 2.8 13917
6 31-07-19 Member (MMINH) 0.0 0
7 31-07-19 Member (MMINH) 2.2 10917
8 31-07-19 Member (MMINH) 0.9 4417
9 30-07-19 Member (KHANH00) 0.7 3417
10 30-07-19 Member (CHIEN2) 1.7 8333
11 30-07-19 Member (CHIEN2) 0.2 750
12 30-07-19 Guest 0.0 2000
13 30-07-19 Member (DUYTQ111) 1.0 5250
14 29-07-19 Member (TRUNGTQ2) 0.7 3500
15 29-07-19 Guest 1.2 7000
16 29-07-19 Member (MMINH) 2.2 11250
答案 3 :(得分:0)
您可以使用一种正则表达式模式提取Hour
和Minute
:
pattern = '(?:(?P<Hour>\d+)Hour\s*)?(?P<Minute>\d+)Minute'
new_df = df['Used time'].str.extract(pattern).fillna(0)
输出:
Hour Minute
0 1 1
1 0 4
2 0 42
3 2 32
4 0 40
5 2 47
6 0 0
7 2 11
8 0 53
9 0 41
10 1 40
11 0 9
12 0 0
13 1 3
14 0 42
15 1 9
16 2 15
有关模式说明,请参见this link。现在,新的使用时间可以计算为:
df['Used time'] = new_df['Hour']*60 + new_df['Minute']