Question

Python初学者。我正在努力为大熊猫使用正则表达式。我有这样的行需要拆分成只包含数字的列。

rando45m text78 here 123  $    1   0% text here  5 . 6&

我需要它显示为

     0    1    2   3 
0   123   1    0   5

我使用了以下两种方法

df2 = df.Keep.str.extractall('(\d+)((\s+)|(\%))')

df3 = df.Keep.str.extractall(r'(?<=\s)(\d+)(?=\s+|\%)')

df2包含单元格中的空格。 df3错误输出断言错误。有没有办法只能为我的数据帧捕获一个组/ 1？

由于

Answer 1

试试这个：

In [39]: df
Out[39]:
                                                      Keep
0  rando45m text78 here 123  $    1   0% text here  5 . 6&
1         aaa 101.5% here 123  $    1   0% text here  55 .

In [40]: df.Keep.str.extractall(r'\b(\d+(?:\.\d+)?)(?:\s|%|$)').unstack()
Out[40]:
           0
match      0    1  2  3     4
0        123    1  0  5  None
1      101.5  123  1  0    55

如何使用正则表达式为pandas数据帧提取一个捕获组？

1 个答案: