如何使用Python Selenium从网站上抓取图片?

时间:2018-03-31 21:50:28

标签: python selenium

以下python代码生成the following html page source code from mcmaster.com

from selenium import webdriver 
driver = webdriver.Firefox()
driver.get("https://www.mcmaster.com")
print driver.page_source

然而,图片无法在html中找到。阅读完Stackoverflow问题(Selenium-and-iframe-in-html)后,我可以使用以下代码拉取iframe

driver.switch_to.frame(driver.find_element_by_id("ResultsIFrame"))
print driver.page_source
#>> <html><head></head><body></body></html>
driver.switch_to.frame(driver.find_element_by_id("MainIFrame"))
#>> <html><head></head><body></body></html>

是否有其他方法可以从此网站获取图片和/或图片属性?

我使用此网站作为示例案例

2 个答案:

答案 0 :(得分:1)

要从此站点获取图片属性,您必须通过 WebDriverWait 诱导服务员,您可以使用以下代码块:

from selenium import webdriver 
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
driver.get("https://www.mcmaster.com")
items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='catg Fastening-Joining']//ul//li//a")))
for i in items :
    print("Item %s link is %s." % (i.text, i.get_attribute("href")))

控制台输出:

Item Screws & Bolts link is https://www.mcmaster.com/#Screws.
Item Threaded Rods & Studs link is https://www.mcmaster.com/#Threaded-Rods.
Item Eyebolts link is https://www.mcmaster.com/#Eyebolts.
Item U-Bolts link is https://www.mcmaster.com/#U-Bolts.
Item Nuts link is https://www.mcmaster.com/#Nuts.
Item Washers link is https://www.mcmaster.com/#Standard-Washers.
Item Shims link is https://www.mcmaster.com/#Shims.
Item Helical & Threaded Inserts link is https://www.mcmaster.com/#Threaded-Inserts.
Item Spacers & Standoffs link is https://www.mcmaster.com/#Spacers.
Item Pins link is https://www.mcmaster.com/#Pins.
Item Anchors link is https://www.mcmaster.com/#Standard-Anchors.
Item Nails link is https://www.mcmaster.com/#Nails.
Item Nailers link is https://www.mcmaster.com/#Nailers.
Item Rivets link is https://www.mcmaster.com/#Rivets.
Item Rivet Tools link is https://www.mcmaster.com/#Rivet-Installation-Tools.
Item Staples link is https://www.mcmaster.com/#Staples.
Item Staplers link is https://www.mcmaster.com/#Staplers.
Item Key Stock link is https://www.mcmaster.com/#Machine-Keys.
Item Retaining Rings link is https://www.mcmaster.com/#Retaining-Rings.
Item Cable Ties link is https://www.mcmaster.com/#Cable-Ties.
Item Lanyards link is https://www.mcmaster.com/#Lanyards.
Item Magnets link is https://www.mcmaster.com/#Magnets.
Item Adhesives link is https://www.mcmaster.com/#Adhesives.
Item Tape link is https://www.mcmaster.com/#Fastening-Tape.
Item Hook & Loop link is https://www.mcmaster.com/#Hook-and-Loop.
Item Electrodes & Wire link is https://www.mcmaster.com/#Standard-Welding-Electrodes.
Item Welders link is https://www.mcmaster.com/#Welders.
Item Gas Regulators link is https://www.mcmaster.com/#Welding-Gas-Regulators.
Item Welding Gloves link is https://www.mcmaster.com/#Welding-Gloves.
Item Welding Helmets & Glasses link is https://www.mcmaster.com/#Welding-Eye-Protectors.
Item Protective Screens link is https://www.mcmaster.com/#Protective-Screens.
Item Brazing Alloys link is https://www.mcmaster.com/#Brazing-Supplies.
Item Torches link is https://www.mcmaster.com/#Torches.
Item Solder link is https://www.mcmaster.com/#Solder.
Item Soldering Irons link is https://www.mcmaster.com/#Soldering-Irons.
Item Melting Pots link is https://www.mcmaster.com/#Melting-Pots.

答案 1 :(得分:0)

我没有那么多地使用硒,但是我通过深入研究代码来刮痧。至于图像,我能够找到它们here,在那里我找到了sample。你应该能够找到那里的一切。如果你有类似的东西想要刮,你可以做同样的事情。

似乎每个类别都有一个大图像,然后只是通过js代码划分放置在链接中,所以如果你想要单个图像,你可能还需要一个代码(不是100%肯定,因为我只是瞥了一眼代码。)