这是从网上下载html文件的代码

Question

我已经制作了一个刮刀，此时默认情况下解析图像链接并将下载的图像保存到python目录中。我现在唯一想做的就是在桌面上选择一个文件夹来保存这些图像，但不能。这就是我要做的事情：

import requests
import os.path
import urllib.request
from lxml import html

def Startpoint():
    url = "https://www.aliexpress.com/"
    response = requests.get(url)
    tree = html.fromstring(response.text)
    titles = tree.xpath('//div[@class="item-inner"]')
    for title in titles:
        Pics="https:" + title.xpath('.//span[@class="pic"]//img/@src')[0]
        endpoint(Pics)

def endpoint(images):
    sdir = (r'C:\Users\ar\Desktop\mth')
    testfile = urllib.request.URLopener()
    xx = testfile.retrieve(images, images.split('/')[-1])
    filename=os.path.join(sdir,xx)
    print(filename)

Startpoint()

执行时，上面的代码抛出一个错误，显示：“join（）参数必须是str或bytes，而不是'tuple'”

Answer 1

您可以使用urllib的python下载图片。你可以在urllib documentation for python 2.7看到python的官方文档。如果您想使用python 3，请遵循此文档urllib for python 3

Answer 2

您可以使用urllib.request，来自io和PIL Image的BytesIO。（如果您有直接的图片网址）

from PIL import Image
from io import BytesIO
import urllib.request

def download_image(url):
    req = urllib.request.Request(url)
    response = urllib.request.urlopen(req)
    content = response.read()
    img = Image.open(BytesIO(content))
    img.filename = url
    return img

Answer 3

图像现在是动态的。所以，我想更新这篇文章：

import os
from selenium import webdriver
import urllib.request
from lxml.html import fromstring

url = "https://www.aliexpress.com/"

def get_data(link):

    driver.get(link)
    tree = fromstring(driver.page_source)
    for title in tree.xpath('//li[@class="item"]'):
        pics = "https:" + title.xpath('.//*[contains(@class,"img-wrapper")]//img/@src')[0]
        os.chdir(r"C:\Users\WCS\Desktop\test")
        urllib.request.urlretrieve(pics, pics.split('/')[-1])

if __name__ == '__main__':
    driver = webdriver.Chrome()
    get_data(url)
    driver.quit()

Answer 4

这是从网上下载html文件的代码

import random
import urllib.request
def download(url):
   name = random.randrange(1, 1000) 
   #this is the random function to give the name to the file 
   full_name = str(name) + ".html" #compatible data type 
   urllib.request.urlretrieve(url,full_name) #main function 
   download("any url")

这是从互联网上下载任何html文件的代码，只需要在功能中提供链接。

在您的情况下，您已经告知您已从网页检索图像链接，因此您可以将扩展名从“.html”更改为兼容类型，但问题是图像可能具有不同的扩展名，可能是“.jpg”，“。png”等。

所以你可以做的是你可以使用if else匹配链接的结尾与字符串匹配，然后在最后分配扩展名。

以下是插图的示例

import random
import urllib.request

if(link extension is ".png"): #pseudo code
     def download(url):
        name = random.randrange(1, 1000) 
        #this is the random function to give the name to the file 
        full_name = str(name) + ".png" #compatible extension with .png 
        urllib.request.urlretrieve(url,full_name) #main function 
        download("any url")
else if (link extension is ".jpg"): #pseudo code
     def download(url):
        name = random.randrange(1, 1000) 
        #this is the random function to give the name to the file 
        full_name = str(name) + ".jpg" #compatible extension with .jpg 
        urllib.request.urlretrieve(url,full_name) #main function 
        download("any url")

您可以为各种类型的扩展使用多个if else。如果它对你的情况有帮助就有一个竖起大拇指。

无法使用python将下载的图像保存到桌面上的文件夹中

4 个答案:

这是从网上下载html文件的代码