使用硒重命名网站中文章标题的屏幕快照

时间:2019-04-12 05:39:33

标签: python selenium

我正在尝试从此website获取每篇文章的所有部分屏幕截图。我成功找到了下面的元素。

<div id="post-4474417" class="post-box    " data-permalink="https://hypebeast.com/2019/1/ten-best-sneakers-paris-fashion-week-fall-winter-2019-runway-shows" data-title="The 10 Best Sneakers From Paris Fashion Week's FW19 Runways">

但是,当我尝试使用element.text重命名屏幕快照时,我只是从网站的最后一个元素中提取了相同的名称。但是,当我使用print(item)输出时,它给了我所有不同的标题。我在这里做什么错了?

print(item)的输出:

  
      
  • 巴黎时装周FW19跑道上的十佳运动鞋
  •   
  • sacai在FW19巴黎时装秀期间首次亮相新款Nike运动鞋
  •   
  • sacai的Whimsical SS19系列包括耐克合作款
  •   
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from PIL import Image
from io import BytesIO
import os
import time
from random import randint
from time import sleep
import requests
from bs4 import BeautifulSoup as bs

driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get('https://hypebeast.com/search?s=nike+sacai+fashion')
time.sleep(1)
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='post-box    ']")))]
element_item = [element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2/span")))]
for item in element_item:
    print(item)

i = 1
for product in products:
    location = product.location_once_scrolled_into_view

    size = product.size
    png = driver.get_screenshot_as_png() 
    im = Image.open(BytesIO(png)) 

    left = location['x']
    top = location['y']
    right = location['x'] + size['width']
    bottom = location['y'] + size['height']
    im = im.crop((left, top, right, bottom)).save(str(i)+"_"+item+".png")
    i=i+1
    if not product :
        pass

sleep(randint(1,2))

driver.quit()

1 个答案:

答案 0 :(得分:2)

im = im.crop((left, top, right, bottom)).save(str(i)+"_"+item+".png")

这将返回您的最后一个值,因为当您在循环结束时迭代循环时,您将只会获得最后一个值。

由于列表元素element_item中已经具有列表值,因此可以在下一个for循环中使用该列表。

i = 1
for product in products:

 im = im.crop((left, top, right, bottom)).save(str(i)+"_"+ element_item[i-1] +".png")

 i=i+1