Question

我正在尝试使用正则表达式对文本执行某些操作。

我正在处理的文字如下：

text="abcd<table class='navbox-columns-table'>The seating</tr>\n</table>fghi<table class='navbox-columns-table'>Going Down</tr>\n</table>"

我想删除与正则表达式匹配的所有文本

<table class=.+?>(.+?)</table>

我正在尝试使用re.sub

来实现这一目标

re.sub(r'<table class=.+?>(.+?)</table>', '1234', text)

我没有得到所需的输出。

我需要的输出是：

"abcdfghi"

正则表达式似乎是正确的，因为当我执行findall（）

时我得到了正确的输出

re.findall('<table class=.+?>(.+?)</table>', text, re.DOTALL)

Output: ['The seating</tr>\n', 'Going Down</tr>\n']

Answer 1

您需要包含DOTALL修饰符(?s)，以便在正则表达式中显示点以匹配换行符。

>>> text="abcd<table class='navbox-columns-table'>The seating</tr>\n</table>fghi<table class='navbox-columns-table'>Going Down</tr>\n</table>"
>>> re.sub(r'(?s)<table class=.+?>(.+?)</table>', '', text)
'abcdfghi'

Answer 2

re.sub(r"(?s)<table[^>]*class=\'.+?\'[^>]*>.*?</table>", r"", string)

正则表达式匹配时删除某些文本

2 个答案: