Question

我正在尝试从字符串中替换字符串rs.

df['Purpose'] = df['Purpose'].str.replace('rs.','')

+-------+----------+--------+
| Input | Expected | Output |
+-------+----------+--------+
| rs.22 | 22       | 22     |
+-------+----------+--------+
| rs32  | rs32     | 2      |
+-------+----------+--------+

测试代码：

x = pd.DataFrame(['rs.22', 'rs32'], columns=['Purpose'])
x['Purpose'] = x['Purpose'].str.replace('rs.','')
print('x mod', x)

这将提供以下输出：

x mod   Purpose
   0      22
   1       2

PS：使用正则表达式[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?提取数字的唯一方法无法区分rs.3.5和3.5，但输出为.3.5

Answer 1

在正则表达式中，句点“ .”几乎匹配所有字符。要匹配字面量，请在前面加上反斜杠：

x['Purpose'] = x['Purpose'].str.replace('rs\.','')

有关正则表达式的操作方法，请参见：https://docs.python.org/3/howto/regex.html

Answer 2

通常，replace在正则表达式模式下运行。您有两个简单的选择可以解决此问题。 @101建议的首选方法是关闭正则表达式：

df['Purpose'] = df['Purpose'].str.replace('rs.', '', regex=False)

另一种选择是对点进行转义，使其匹配实际的句点而不是任何字符。当引入regex参数时，此选项可用于0.23.0之前的熊猫版本：

df['Purpose'] = df['Purpose'].str.replace(r'rs\.', '')

正则表达式匹配通常比简单的字符串比较慢，因此可以重定向第一个选项，以提高性能。

Answer 3

这是正确的，您需要使用st replace熊猫具有自己的替换功能：-

    >>> df
       Input
    0  rs.22
    1  rs321
   >>> df['Input'].replace("rs\.","",regex=True)
    0       22
    1    rs321
    Name: Input, dtype: object
   >>>

Answer 4

基本上，问题在于pandas.Series.str.replace()在默认情况下具有regex=True，因此它假定传入的模式是正则表达式。

您可以使用：

x['Purpose'] = x['Purpose'].str.replace('rs.', '', regex=False)

如何替换以。结尾的字符串替换。（期）？

4 个答案:

如何替换以。结尾的字符串替换。 （期）？

4 个答案:

如何替换以。结尾的字符串替换。（期）？