如何在网站上获取图像的绝对路径

时间:2016-10-18 14:03:38

标签: python-3.x beautifulsoup python-requests relative-path absolute-path

在Firefox中,可以右键单击图像并选择“复制图像位置”。这允许人们获得绝对图像路径,即使在图像的src属性中仅提供相对路径。 是否有可能以编程方式获得此绝对路径?它存放在哪里?

我使用Python3,请求访问该网站,美丽的汤来解析HTML。

1 个答案:

答案 0 :(得分:0)

简单解决方案

from bs4 import BeautifulSoup
from requests import get

url = 'https://example.com/'
response = get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# converting to a set will prevent duplicates
images = set([img['src'] for img in soup.find_all('img') if hasattr(img, 'src')])

for img in images:
    print(img)

扩展解决方案

如果图像使用相对路径(或外部主机,cdn等),我们可以使用下面的代码清除大部分。

注意:使用本地URI(file:///temp/web/img1.png

时无效

此代码使用validators包,因此请安装pip install validators

from bs4 import BeautifulSoup
from requests import get
from os.path import join, normpath
import validators

url = 'https://example.com/'
response = get(url)
soup = BeautifulSoup(response.content, 'html.parser')

images = set([img['src'] for img in soup.find_all('img') if hasattr(img, 'src')])

list_of_img_paths = []

for img in images:
    if not validators.url(url):  # If NOT a valid URL
        # Here we can assume we are dealing with a relative path
        formatted_url = normpath(join(url, img))  # format a valid url
        list_of_img_paths.append(formatted_url)  # add to list
    else:
        list_of_img_paths.append(img)