Pandas Dataframe:从字符串中提取数值(包括小数)

时间:2018-04-07 05:04:31

标签: python pandas

我有一个由一列字符串组成的数据框。我想提取这些字符串的数字。但是,有些值以米为单位,有些以千米为单位。如何检测数字旁边是否有“m”或“km”,标准化单位然后将数字提取到新列?

details                 numbers
Distance                350m
Longest straight        860m
Top speed               305km
Full throttle           61 per cent

期望的输出:

details                 numbers
Distance                350
Longest straight        860
Top speed               305000
Full throttle           61

1 个答案:

答案 0 :(得分:1)

使用:

m = df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d+)', expand=False).astype(int)
df.loc[m, 'numbers'] *= 1000 

print (df)
            details  numbers
0          Distance      350
1  Longest straight      860
2         Top speed   305000
3     Full throttle       61

说明:

  1. contains
  2. 获取km值的掩码
  3. 提取整数值并按extract
  4. 转换为int
  5. 多次更正km
  6. 编辑:对于提取float的值,将extract中的正则表达式更改为this solution,同时最后转换为float s:

    print (df)
                details      numbers
    0          Distance        1.7km
    1  Longest straight       860.8m
    2         Top speed        305km
    3     Full throttle  61 per cent
    
    m =  df['numbers'].str.contains('\d+km')
    df['numbers'] = df['numbers'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
    df.loc[m, 'numbers'] *= 1000 
    print (df)
                details   numbers
    0          Distance    1700.0
    1  Longest straight     860.8
    2         Top speed  305000.0
    3     Full throttle      61.0