Question

我正在尝试从this页面下载图片。我编写了以下Python脚本：

import requests
import subprocess
from bs4 import BeautifulSoup

request = requests.get("http://ottofrello.dk/malerierstor.htm")
content = request.content
soup = BeautifulSoup(content, "html.parser")
element = soup.find_all("img")
for img in element:
    print (img.get('src'))

但是，我只获取图像名称而不是完整路径。在网站上，当我检查html并显示链接时，我可以将鼠标悬停在图像名称上。有没有办法可以使用BeautifulSoup解析这个链接？ Image

Answer 1

页面中的图像URI相对于主机名标记。

您可以使用urljoin模块中的urllib.parse函数为每个图像构建绝对网址。

from urllib.parse import urljoin

page_url = "http://ottofrello.dk/malerierstor.htm"
request = requests.get(page_url)


...
    for img in element:
        image_url = urljoin(
            page_url, 
            img.get('src')
        )
        print(image_url)

Answer 2

据我了解，您对绝对图像路径感兴趣，而不是您现在获得的相对路径。我做的唯一改变就是你的打印声明。

import requests
import subprocess
from bs4 import BeautifulSoup

request = requests.get("http://ottofrello.dk/malerierstor.htm")
content = request.content
soup = BeautifulSoup(content, "html.parser")
element = soup.find_all("img")
for img in element:
    print ('http://ottofrello.dk/' + img.get('src'))

除非将鼠标悬停在src标记上，否则在未显示完整图像链接时使用BeautifulSoup下载图像

2 个答案: