我有一个数据框,其中一列如下:
df = index dosage_duration
0 5 years20mg 1X D
1 2 days10mg 1X D
2 2 days10mg 1X D
3 7 weeks
4 2 months
5 3 days
6 1 years5 MG
7 2 years
我想做的是提取时间的第一部分并将其转换为几天。 因此结果将如下所示:
df = index dosage_duration new_dosage
0 5 years20mg 1X D 5*365
1 2 days10mg 1X D 2
2 2 days10mg 1X D 2
3 7 weeks 7*7
4 2 months 2*30
5 3 days 3
6 1 years5 MG 1*365
7 2 years 2*365
如您在此处看到的,5年将转换为5 * 365天。
我能够在第一行中说出5
中的first row
,在第二行中说出2
...但是我不确定如何获得{{ 1}} years
或days
,因此我可以将所有值更改为天标。
显然,我需要找到month
之后的第一个数字,但我不知道该怎么做。
答案 0 :(得分:2)
让我们尝试一下:
df = pd.DataFrame({'dosage_duration':['5 years20mg 1x D'
,'2 days10mg 1x D'
,'4 months20mg 1x D'
,'7 weeks'
,'2 months'
,'3 days'
,'1 days'
,'1 years5 MG'
,'2 years'
,'6 months'
,'1 years10 1x D'
,'10 months15']})
nmap={'years':365, 'months':30, 'weeks':7, 'days': 1}
strnmap = '|'.join(nmap.keys())
df_m = df.dosage_duration.str.extract(f'(?P<unit>\d+)\s?(?P<span>[{strnmap}]+)')
df['new_duration']= df_m['unit'].astype(int).mul(df_m['span'].map(nmap))
print(df)
输出:
dosage_duration new_duration
0 5 years20mg 1x D 1825
1 2 days10mg 1x D 2
2 4 months20mg 1x D 120
3 7 weeks 49
4 2 months 60
5 3 days 3
6 1 days 1
7 1 years5 MG 365
8 2 years 730
9 6 months 180
10 1 years10 1x D 365
11 10 months15 300
答案 1 :(得分:1)
day
,week
,month
,year
。仅第一个字母就足以确定要相乘的内容。import pandas as pd
df = pd.DataFrame({'dosage_duration':['5 years27abc','10 days92pqr', '5.5 weeks782364hgsdf', '3 months21647hadjh']})
mul = {
'd':1,
'w':7,
'm':30,
'y':365
}
df['new_dosage'] = df['dosage_duration'].apply(lambda x:x.split()).apply(lambda x:float(x[0])*mul[x[1][0]])
df
输出:
dosage_duration new_dosage
0 5 years27abc 1825
1 10 days92pqr 10
2 5.5 weeks782364hgsdf 35
3 3 months21647hadjh 90
更新:
import pandas as pd
df = pd.DataFrame({'t':['5 years27abc','10 days92pqr', '5 weeks782364hgsdf', '3 months21647hadjh']})
mul = {
'd':'1',
'w':'7',
'm':'30',
'y':'365'
}
df['total_time'] = df['t'].apply(lambda x:x.split()).apply(lambda x:x[0] + '*' + mul[x[1][0]])
df
输出:
t total_time
0 5 years27abc 5*365
1 10 days92pqr 10*1
2 5 weeks782364hgsdf 5*7
3 3 months21647hadjh 3*30