Question

我试图找出如何用<a href....>TEXT</a>标签替换所有<p>TEXT</p>标签。

我开始寻找<a href...>和</a>的模式，因此我可以相应地替换它们。不幸的是，它似乎与最接近的字符串不匹配。

>>> s = '<td class="tt"><a href="#">Alert types</a></td>&#13;<td class="info">Vibration</td>&#13;      </tr><tr><td class="tt"><a href="#">Sound</a>'

>>> re.sub('<a h.*>','<p>',s)

返回

'<td class="tt"><p>'

而不是：

 '<td class="tt"><p>Alert types</a></td>&#13;<td class="info">Vibration</td>&#13;      </tr><tr><td class="tt"><p>Sound</a>'

您知道如何使其与.*之间最接近的字符串匹配吗？

Answer 1

使用以下方法：

s = '<td class="tt"><a href="#">Alert types</a></td>&#13;<td class="info">Vibration</td>&#13;      </tr><tr><td class="tt"><a href="#">Sound</a>'
replaced = re.sub(r'<a[^>]+?>([\w\W]+?)<\/a>', r'<p>\1</p>', s)

print(replaced)

输出：

<td class="tt"><p>Alert types</p></td>&#13;<td class="info">Vibration</td>&#13;      </tr><tr><td class="tt"><p>Sound</p>

Answer 2

不确定使用正则表达式是否是一个好主意。但如果您更喜欢正则表达式，那么它就是：

re.sub('<a [^>]*>([^<]*)</a>','<p>\\1</p>',s)

使用([^<]*)它会捕获a标记之间的文字，而替代它正在使用该群组作为\\1

Answer 3

这应该有效。

搜索者：

(<.+?>)(.+)(<.+?>)

输入：

<a href="#">Sound</a>

替换为：

<p>$2</p>

输出：

<p>Sound</p>

Python代码：

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(<.+?>)(.+)(<.+?>)"

test_str = "<a href=\"#\">Sound</a>"

subst = "<p>$2</p>"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

请参阅：https://regex101.com/r/j4OsbX/1

如何用另一种标签替换一种标签（<a ...="">..</a> =＆gt; <p> .. </p>）

3 个答案: