我有一个由一列字符串组成的数据框。我想提取这些字符串的数字。但是,有些值以米为单位,有些以千米为单位。如何检测数字旁边是否有“m”或“km”,标准化单位然后将数字提取到新列?
details numbers
Distance 350m
Longest straight 860m
Top speed 305km
Full throttle 61 per cent
期望的输出:
details numbers
Distance 350
Longest straight 860
Top speed 305000
Full throttle 61
答案 0 :(得分:1)
使用:
m = df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d+)', expand=False).astype(int)
df.loc[m, 'numbers'] *= 1000
print (df)
details numbers
0 Distance 350
1 Longest straight 860
2 Top speed 305000
3 Full throttle 61
说明:
contains
km
值的掩码
extract
int
km
值编辑:对于提取float
的值,将extract
中的正则表达式更改为this solution,同时最后转换为float
s:
print (df)
details numbers
0 Distance 1.7km
1 Longest straight 860.8m
2 Top speed 305km
3 Full throttle 61 per cent
m = df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
df.loc[m, 'numbers'] *= 1000
print (df)
details numbers
0 Distance 1700.0
1 Longest straight 860.8
2 Top speed 305000.0
3 Full throttle 61.0