如何使用for循环下载多个文件

时间:2020-01-26 16:39:26

标签: python web-scraping

我坚持应该是一个相当简单的问题。但是我是一个初学者,所以对我来说并不明显。 我正在尝试使用动态名称从网站下载图像。我认为发生的事情是我一次又一次地覆盖同一文件,或者只下载了最后一个文件(美国最受欢迎的体育节目)。如果我对文件名进行硬编码或将下载限制为一个文件,则此方法有效,但这显然不是重点。否则,我会收到一条错误消息:No such file or directory: 'C:\\My File Path\\Images\\John Wick: Chapter 1.jpg' 有人可以指出我正确的方向吗?

driver = webdriver.Chrome(executable_path=r'C:\Program Files\chromedriver.exe')
driver.get("https://public.tableau.com/en-gb/gallery/?tab=viz-of-the-day&type=viz-of-the-day")
wait = WebDriverWait(driver, 10)

vizzes = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".gallery-items-list 
div.gallery-list-item-container")))
for viz in vizzes:

    #name of the viz
    viz_name = viz.find_element_by_class_name("gallery-list-item-title-left").text

    #get image links
    images = viz.find_element_by_xpath(".//img[@data-test-id='galleryListItem-thumbnail-image']")
    image_link = images.get_attribute("src")

    #download images 
    myfile = requests.get(image_link)

    with open("C:\My File Path\Images" + "\\" + viz_name + ".jpg", "wb") as f:
            f.write(myfile.content)

time.sleep(5)

driver.close()

1 个答案:

答案 0 :(得分:2)

文件名中不能包含某些字符。问题是,标题中可以包含任何字符。

您不能有冒号(:),不能有问号(?),不能有空格等。问题是,标题具有所有这些内容。您需要一个函数将标题转换为可以正确用作文件名的名称。

这是我使用的功能:

def valid_file_name(name):
    return name.replace(" ", "_").replace("?","").replace(":","")

我在这里放它:

    with open("C:\\Users\\Matthew\\Pictures\\dumping" + "\\" + valid_file_name(viz_name) + ".jpg", "wb") as f:
            f.write(myfile.content)

下面是完整的完整代码,它对我有用。确保将图像文件夹更改为要使用的文件夹。

from selenium import webdriver
import requests
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

def valid_file_name(name):
    return name.replace(" ", "_").replace("?","").replace(":","")

driver = webdriver.Chrome()
driver.get("https://public.tableau.com/en-gb/gallery/?tab=viz-of-the-day&type=viz-of-the-day")
wait = WebDriverWait(driver, 15)

vizzes = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".gallery-items-list div.gallery-list-item-container")))
for viz in vizzes:

    #name of the viz
    viz_name = viz.find_element_by_class_name("gallery-list-item-title-left").text

    #get image links
    images = viz.find_element_by_xpath(".//img[@data-test-id='galleryListItem-thumbnail-image']")
    image_link = images.get_attribute("src")

    #download images
    myfile = requests.get(image_link)

    print(valid_file_name(viz_name))
    with open("C:\\Users\\Matthew\\Pictures\\dumping" + "\\" + valid_file_name(viz_name) + ".jpg", "wb") as f:
            f.write(myfile.content)

time.sleep(5)

driver.close()