Question

我有一个名为'value'的列，来自pandas数据帧df，它包含数字和单词的混合。它看起来像这样：

   VALUE
0   done
1   Yes
2   3.45
3   2bc

我想将列拆分为2列，其中左侧列只有字母而右侧只有数字。理想情况下，结果应为：

     0    1
0   done NaN
1   Yes  NaN
2   NaN  3.45
3   bc   2

我尝试使用.str.extract pandas函数，如下所示：

df['value'].str.extract('([A-Za-z]+)?([0-9]*[.]?[0-9]+)')

我得到的结果类似于以下内容：

    0    1
0   NaN NaN
1   NaN NaN
2   NaN 3.45
3   NaN NaN

其中的单词不会显示在第0列中。

有谁知道在pandas / python中进行此类操作的原因或更好的方法？

Answer 1

修复您的模式，并使用str.extractall：

(df.VALUE.str.extractall('(\d+(?:\.\d+)?)|([^\d.]+)')
   .unstack()
   .groupby(level=0, axis=1)
   .first())

      0     1
0   NaN  done
1   NaN   Yes
2  3.45   NaN
3     2    bc

使用带有pandas的正则表达式分隔字母和数字

1 个答案: