我正在尝试从此website获取每篇文章的所有部分屏幕截图。我成功找到了下面的元素。
<div id="post-4474417" class="post-box " data-permalink="https://hypebeast.com/2019/1/ten-best-sneakers-paris-fashion-week-fall-winter-2019-runway-shows" data-title="The 10 Best Sneakers From Paris Fashion Week's FW19 Runways">
但是,当我尝试使用element.text
重命名屏幕快照时,我只是从网站的最后一个元素中提取了相同的名称。但是,当我使用print(item)
输出时,它给了我所有不同的标题。我在这里做什么错了?
print(item)
的输出:
- 巴黎时装周FW19跑道上的十佳运动鞋
- sacai在FW19巴黎时装秀期间首次亮相新款Nike运动鞋
- sacai的Whimsical SS19系列包括耐克合作款
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from PIL import Image
from io import BytesIO
import os
import time
from random import randint
from time import sleep
import requests
from bs4 import BeautifulSoup as bs
driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get('https://hypebeast.com/search?s=nike+sacai+fashion')
time.sleep(1)
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='post-box ']")))]
element_item = [element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2/span")))]
for item in element_item:
print(item)
i = 1
for product in products:
location = product.location_once_scrolled_into_view
size = product.size
png = driver.get_screenshot_as_png()
im = Image.open(BytesIO(png))
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom)).save(str(i)+"_"+item+".png")
i=i+1
if not product :
pass
sleep(randint(1,2))
driver.quit()
答案 0 :(得分:2)
im = im.crop((left, top, right, bottom)).save(str(i)+"_"+item+".png")
这将返回您的最后一个值,因为当您在循环结束时迭代循环时,您将只会获得最后一个值。
由于列表元素element_item
中已经具有列表值,因此可以在下一个for循环中使用该列表。
i = 1
for product in products:
im = im.crop((left, top, right, bottom)).save(str(i)+"_"+ element_item[i-1] +".png")
i=i+1