Question

我正在尝试在pandas数据帧中进行字符串替换。需要循环遍历各个列，因此它基本上是系列中的替代品：

In [105]: df = pd.DataFrame([['0 - abc', 1, 5], ['0 - abc - xyz', 2, 3]], columns=['col1','col2','col3'])

In [106]: df
Out[106]:
            col1  col2  col3
0        0 - abc     1     5
1  0 - abc - xyz     2     3

In [107]: for col in df.columns:
     ...:     df[col] = df[col].replace(to_replace='".*"|^0', value=df['col3'], inplace=False, regex=True)
     ...:

In [108]: df
Out[108]:
   col1  col2  col3
0     5     1     5
1     3     2     3

而不是上面的df，我期待结果为：

In [110]: df_result
Out[110]:
            col1  col2  col3
0        5 - abc     1     5
1  3 - abc - xyz     2     3

也就是说，在＆＃39; 0 - abc＆＃39;中，只有＆＃39; 0＆＃39;在开始时应该替换为＆＃39; 5＆＃39;而不是整个字符串。

我的正则表达式中缺少什么？是否有另一种方法可以在熊猫中完成这种字符串替换？感谢。

Answer 1

使用df['col3']将str转换为.astype可修复您的问题：

In [836]: df.iloc[:, 0].replace('^0', df['col3'].astype(str), regex=True)
Out[836]: 
0          5 - abc
1    3 - abc - xyz
Name: col1, dtype: object

我也简化了你的正则表达式，虽然我不是 100％确定它适合你所有的用例：

^0

这只会匹配前导零并替换它。您可以根据需要将其合并到代码中。

pandas系列中的部分字符串替换

1 个答案: