Question

我正在尝试从网站上下载一些图片（假设是前10张）。问题是我不知道html的工作原理。

我到目前为止所做的：

from selenium import webdriver
import time

driver = webdriver.Chrome("C:\web_driver\chromedriver")
url = "https://9gag.com/"
driver.get(url)

time.sleep(5)
driver.find_element_by_xpath("/html/body/div[7]/div[1]/div[2]/div/div[3]/button[2]/span").click()

images = driver.find_elements_by_tag_name('img')
list = []
for image in images:
    print(image.get_attribute('src'))
    list.append(image.get_attribute('src'))

我想在页面中心下载图像，但该程序只是在左侧栏中检索图像。我试图解决此问题的尝试是：

from selenium import webdriver
import time

driver = webdriver.Chrome("C:\web_driver\chromedriver")
url = "https://9gag.com/"
driver.get(url)

time.sleep(5)


# this part is to close the cookies pop up
driver.find_element_by_xpath("/html/body/div[7]/div[1]/div[2]/div/div[3]/button[2]/span").click()

    images = driver.find_element_by_class_name("page").get_attribute("img")

    list = []
    for image in images:
        print(image.get_attribute('src'))
        # list.append(image.get_attribute('src'))
        # print("list:", list)
        time.sleep(1)

但出现以下错误：

Traceback (most recent call last):
  File "C:/Users/asus/PycharmProjects/project1/36.py", line 14, in <module>
    for image in images:
TypeError: 'NoneType' object is not iterable

Process finished with exit code 1

Answer 1

元素<div class=page>不包含任何img属性。您必须寻找<img>标签
find_element_by_仅返回一个元素。要获取元素列表，您必须使用find_elements_by_。这就是为什么您得到错误。
要从帖子中获取图像，必须在帖子内部指定图像。尝试使用以下XPath查找帖子中的图像。 //div[contains(@id,'stream-')]//div[@class='post-container']//picture/img
请记住，gif不是图像，也不在<image>标签内。因此，您只能通过这种方法获取静止图像。

尝试一下：

images = driver.find_elements_by_xpath("//div[contains(@id,'stream-')]//div[@class='post-container']//picture/img")
list = []
for image in images:
    print(image.get_attribute('src'))
    list.append(image.get_attribute('src'))

它将所有找到的图像源放入列表。

如何使用Selenium和Python下载图像

1 个答案: