Question

我正在尝试从字符串中为特定列提取浮点值。

原始输出

DATE        strCondition
4/3/2018    2.9
4/3/2018    3.1, text
4/3/2018    2.6 text
4/3/2018    text, 2.7

和其他变体。我也尝试过正则表达式，但是我的知识有限，我想出了：

clean = df['strCondition'].str.contains('\d+km')
df['strCondition'] = df['strCondition'].str.extract('(\d+)', expand = False).astype(float)

输出最终看起来像这样，其中显示了所示的主要整数...

DATE        strCondition
4/3/2018    2.0
4/3/2018    3.0
4/3/2018    2.0
4/3/2018    2.0

我想要的输出如下：

DATE        strCondition
4/3/2018    2.9
4/3/2018    3.1
4/3/2018    2.6
4/3/2018    2.7

感谢您的宝贵时间和投入！

编辑：我忘了提到，在我的原始数据框中有类似于

的strCondition条目

2.9(1.0) #where I would like both numbers to get returned
11/11/2018 #where this date as a string object can be discarded

抱歉给您带来的不便！

Answer 1

尝试：

df['float'] = df['strCondition'].str.extract(r'(\d+.\d+)').astype('float')

输出：

       DATE strCondition  float
0  4/3/2018          2.9    2.9
1  4/3/2018    3.1, text    3.1
2  4/3/2018     2.6 text    2.6
3  4/3/2018    text, 2.7    2.7

Answer 2

一个简单的替换将是

找到(?m)^([\d/]+[ \t]+).*?(\d+\.\d+).*

替换\1\2

https://regex101.com/r/pVC4jc/1

Pandas数据框：从列中的字符串中提取浮点值

2 个答案: