Python:搜索并替换HTML字符串中的所有img标签

时间:2019-02-28 08:21:35

标签: python python-3.x

我在Python中找到了一种解决方案,可以搜索和替换HTML字符串中的所有img标签:

我有一个HTMl字符串:

"<h1>H1 Tag</h1>\n<p>foo <img alt=\"alt\" src=\"image_2.jpg\
bar</p>\n<p>11</p>\n<h2>H2
Tag</h2>\n<p>ads\nad\nad\nad</p>\n<h3>Imsd</h3>\n<p><img alt=\"alt\"
src=\"image_3.jpg\"

我想通过添加基本URL https://domman.com搜索和替换HTML字符串中的所有img标签。所以我想要这个结果:

"<h1>H1 Tag</h1>\n<p>foo <img alt=\"alt\" src=\"https://domman.com/image_2.jpg\
 bar</p>\n<p>11</p>\n<h2>H2
 Tag</h2>\n<p>ads\nad\nad\nad</p>\n<h3>Imsd</h3>\n<p><img alt=\"alt\"
 src=\"https://domman.com/image_3.jpg\"

3 个答案:

答案 0 :(得分:3)

您可以使用BeautifulSoup替换img标签的所有src。

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_str)
for img in soup.findAll('img'):
    img['src'] = 'https://domman.com/'+img['src']
html_str = str(soup)

答案 1 :(得分:2)

string.replace(r'src=\"', r'src=\"https://domman.com/')

答案 2 :(得分:1)

import lxml.html
html = lxml.html.fromstring("""<h1>H1 Tag</h1>\n<p>foo <img alt="alt" 
src="image_2.jpg"> 
bar</p><p>11</p>\n<h2>H2 Tag</h2>\n<p>
ads\nad\nad\nad</p>\n<h3>Imsd</h3>\n<p><img alt="alt" src="image_3.jpg">""")
imgs = html.xpath("//img")
for img in imgs:
    img.attrib["src"] = "https://domman.com/" + img.attrib["src"]
with open("page.html", "wb") as f:
    f.write(lxml.html.tostring(html))

就这样