尽管我在StackOverflow中找到了一些引用,但是我无法编写正确的正则表达式来实现我的目标。我想从python中的字符串中删除特定标点符号之前和之后的空格。
我的功能如下。
def modify_answers(answers):
hyp = []
for ans in answers:
# remove whitespace before - / ? . ! ;
newhyp = re.sub(r'\s([-/?.!,;](?:\s|$))', r'\1', ans)
# remove whitespace after - / $ _
newhyp = re.sub(r'', r'\1', newhyp)
hyp.append(newhyp)
return hyp
一些我想实现的例子:
“税号是1-866-704-7388。” --->“税号是1-866-704-7388。”
“不,e在维多利亚州不受保护。” --->“不,e在维多利亚州不受保护。”
“发现会因结构而失去_ _ _ _ _ _ _。” --->“发现是失去的,就像构造是______。”
“ $ 1,0等于$ 1,0。” --->“ $ 1,0等于$ 1,0。”
任何帮助将不胜感激。
答案 0 :(得分:4)
首先,定义一个执行替换的函数:
import re
def replace(x):
y, z = x.groups()
if z in '-/?.!,;':
y = y.lstrip()
if z in '-/$_':
y = y.rstrip()
return y
该函数采用匹配模式并相应地执行替换。
现在,定义您的图案。您可以预编译以提高效率。
p = re.compile(r'(\s*([-/?.,!$_])\s*)')
使用前面定义的回调在每个字符串上调用已编译的正则表达式sub
:
cases = [
"Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]
repl = [p.sub(replace, c) for c in cases]
print (repl)
['Tax pin number is 1-866-704-7388.', 'No, emu is not protected in Victoria.',
'Find is to lose as construct is to ______.', '$1,0 is equal to $1,0.']
答案 1 :(得分:3)
您可以这样做:
import re
sentences = ["Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]
def modify_answers(answers):
hyp = []
for ans in answers:
# remove whitespace before - / ? . ! ;
new_hyp = re.sub(r'\s([/?.!;_-])(\s|$)', r'\1', ans)
new_hyp = re.sub(r'\s(,)(\s|$)', r'\1 ', new_hyp)
new_hyp = re.sub(r'(^|\s)(\$)(\s|$)', r' \2', new_hyp)
hyp.append(new_hyp.strip())
return hyp
for sentence in modify_answers(sentences):
print(sentence)
输出
Tax pin number is 1-866-704-7388.
No, emu is not protected in Victoria.
Find is to lose as construct is to______.
$1,0 is equal to $1,0.
注释
/?.!;_-
中的任何一个。 -
符号表示[]
内的范围,因此必须将其放在末尾。,
(用逗号后跟一个空格)代替,
,并用空白包围
$
(由空格括起来的美元符号)代替由空格包围的$
。在此正则表达式中,您必须引用第二组。答案 2 :(得分:3)
使用r' (?=[-/?.!])|(?<=[-/$_]) '
用空字符串替换模式re.sub
>>> lst = ["Tax pin number is 1 - 866 - 704 - 7388 .",
... "No , emu is not protected in Victoria .",
... "Find is to lose as construct is to _ _ _ _ _ _ .",
... "$ 1,0 is equal to $ 1,0 ."]
>>>
>>> def modify_answers(answers):
... ptrn = re.compile(r' (?=[-/?.!])|(?<=[-/$_]) ')
... return [ptrn.sub('', answer) for answer in answers]
...
>>>
>>> pprint(modify_answers(lst))
['Tax pin number is 1-866-704-7388.',
'No , emu is not protected in Victoria.',
'Find is to lose as construct is to ______.',
'$1,0 is equal to $1,0.']