我正在处理一个非常脏的数据,这些数据具有我需要删除的不同类型的字符。下面只是一个快照。我只想从起始字符中删除这些字符,但是它会删除col1中的所有字符。 数据在数据框中:
COL1:
, Matt R, Carl A
_ Hello, World_
). My Name is ). 'Amy'
. My name is 'Matt'
., My name is 'Clark'
My name is 'Amy' #clean row
代码:
articles[col1].str.replace(",","")
articles[col1].str.replace("_","")
articles[col1].str.replace(").","")
articles[col1].str.replace(".","")
articles[col1].str.replace(".,","")
答案 0 :(得分:2)
如果您只想从字符串开头删除不良字符,可以使用pandas.Series.str.replace
:
In [26]: df
Out[26]:
col1
0 , Matt R, Carl A
1 _ Hello, World_
2 ). My Name is ). 'Amy'
3 . My name is 'Matt'
4 ., My name is 'Clark'
In [27]: df['col1'] = df['col1'].str.replace(r'^[^a-zA-Z]+', '')
In [28]: df
Out[28]:
col1
0 Matt R, Carl A
1 Hello, World_
2 My Name is ). 'Amy'
3 My name is 'Matt'
4 My name is 'Clark'
答案 1 :(得分:0)
假设字符串位于名为“a”的变量中,则为:
import re
re.sub(r'(\.,|_|\.|\)\.|,)(.*)', r'\2', a)
返回:
Matt R, Carl A
Hello, World_
My Name is ). 'Amy'
My name is 'Matt'
My name is 'Clark'
My name is 'Amy' #clean row