示例字符串:
s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
我需要将其转换为:
s = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
这需要同时处理开始和结束标记以及所有分隔符,如&#34;。&#34;,&#34;,&#34;,&#34; - &#34;, &#34;(&#34;,&#34;)&#34;等等。
我可以进行搜索并替换&#34;)&#34;等等,但显然我想要一些更性感的东西。
所以基本上将所有分隔符移到标记之外。
谢谢!
答案 0 :(得分:4)
以下正则表达式可以帮助您将开始和结束标记内的分隔符移动到结束标记的下一个。
(<sec>)([^.,()-]*)([.,()-])(<\/sec>)
更换字符串:
\1\2\4\3
>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<sec>)([^.,()-]*)([.,()-])(<\/sec>)', r'\1\2\4\3', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
或强>
这适用于任何标签,
>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<(\S+?\b)[^>]*>)([^.,()-]*)([.,()-])(<\/\2>)', r'\1\3\5\4', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
答案 1 :(得分:2)
其他正则表达式变体:
>>> s = "Nicely<sec>, John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'((?:<[^>]+>)?)( *[-.(),]+ *)((?:</[^>]+>)?)',r'\3\2\1',s)
# ^^ ^^
# move spaces with the punctuation
# remove that if not needed
'Nicely, <sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
想法是交换打开标签↔标点符号或标点符号↔结束标记。