我正在尝试使用正则表达式替换文本中的某些内容。
我的数据框:
A B C
French house Phone. <phone_numbers>
English house email - <adresse_mail>
French apartment code : bla!123
French house Hello George!
English apartment Ethan, my phone is <phone_numbers>
好的输出:
A B C
French house Phone. <phone_numbers>
English house email - <adresse_mail>
French apartment code : <code>
French house Hello George!
English apartment Ethan, my phone is <phone_numbers>
首先,我尝试过:
df['C'] = df['C'].str.replace(r'((ask code)|(code))\s?:?\s?\w+','<code>')
它有效,但不完全。
code : bla!123
输出:
<code>!123
所以,我尝试了这个:
df['C'] = df['C'].str.replace(r'(ask code)|(code)\s?:?\s?), (\s?\w+)', r'\2,<code>')
但是什么也没发生...
答案 0 :(得分:3)
我愿意:
df['C'] = df['C'].str.replace(r'(ask code|code)(\s?:?\s?).+', r'\1\2<code>')
答案 1 :(得分:2)
输入:
import re
string = 'code : bla!123'
string.replace((re.match(r'code*\s?:?\s?(.*)',string)[1]), '<code>')
输出:
'code : <code>'