Question

我有一个DF列，里面有很多字符串。我需要从该列中删除所有非字母数字字符：即：

df['strings'] = ["a#bc1!","a(b$c"]

运行代码：

Print(df['strings']): ['abc','abc']

我试过了：

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

但这不起作用，我觉得应该有一种更有效的方法来使用正则表达式来做到这一点。任何帮助都将非常感激。

Answer 1

使用str.replace。

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

要保留字母数字字符（不仅仅是字母表符合您的预期输出所示），您还需要：

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object

Answer 2

由于您编写了字母数字，因此需要在正则表达式中添加0-9。但也许你只想要字母......

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

但它基本上是COLDSPEED所写的

Answer 3

您也可以使用正则表达式

import re

regex = re.compile('[^a-zA-Z]')

l = ["a#bc1!","a(b$c"]

print [regex.sub('', i) for i in l]

['abc', 'abc']