Question

如何在html（python）中用锚点替换链接？

例如输入：

 <p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>

我想要保存p标签的结果（只删除标签）：

<p>
Hello link text1 and link text2 ! 
</p>

Answer 1

您可以使用简单的正则表达式和sub函数执行此操作：

import re

text = '<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'
pattern =r'<(a|/a).*?>'

result = re.sub(pattern , "", text)

print result
'<p> Hello link text1 and link text2 ! </p>'

此代码用空字符串替换所有出现的<a..>和</a>标记。

Answer 2

看起来像BeautifulSoup的unwrap()方法的完美案例：

from bs4 import BeautifulSoup
data = '''<p> Hello <a href="http://example.com">link text1</a> and <a href="http://example.com">link text2</a> ! </p>'''
soup = BeautifulSoup(data)
p_tag = soup.find('p')
for _ in p_tag.find_all('a'):
    p_tag.a.unwrap()
print p_tag

这给出了：

<p> Hello link text1 and link text2 ! </p>

Answer 3

你可以使用Parser Library ..比如BeautifulSoup等。我不确定，但你可以得到一些here

用文本替换HTML链接

3 个答案: