Selenium 网页抓取 iframe

时间:2021-02-16 09:09:27

标签: python selenium xpath iframe css-selectors

我想在我办公室的各种打印机的网页上读取碳粉值。

问题是页面是由好几帧组成的,剩下的那个是js写的,用selenium也看不懂

这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.expected_conditions import (
    presence_of_element_located)
from selenium.webdriver.support.wait import WebDriverWait

def get_comment_count(driver, url):
    driver.get(url)
    wait = WebDriverWait(driver, 3)
    e = driver.find_elements_by_xpath("/html/frameset/frame")
    driver.switch_to_frame(e[0])
    toner_iframe = driver.find_elements_by_xpath('//*[@id="contain"]')
    # iframe_url = toner_iframe.get_attribute('src')
    #driver.switch_to_frame(toner_iframe)
    driver.switch_to.frame(toner_iframe)
    print(toner_iframe)
    
url = "https://pritner_web_page"

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')

driver = webdriver.Chrome(options=options)

get_comment_count(driver,url)

我也试过了...

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')

driver = webdriver.Chrome(options=options)
driver.get("http://printer_web_page")


WebDriverWait(driver,5).until(EC.frame_to_be_available_and_switch_to_it((By.ID,'wlmframe')))

WebDriverWait(driver,5).until(EC.frame_to_be_available_and_switch_to_it((By.ID,'toner')))
page_source=driver.page_source
print(page_source)

这是页面的 DOM Inspector。各种框架都是动态的,用js写成如下:

enter image description here

我写的代码只是进入框架的几种不同尝试之一,但无济于事

1 个答案:

答案 0 :(得分:0)

该元素位于嵌套的 <frame> / <iframe> 元素内,因此您必须:

  • 诱导WebDriverWait 框架可用并切换到它

  • 框架诱导WebDriverWait可用并切换到它

  • 诱导 WebDriverWait 使所需的元素可点击

  • 您可以使用以下任一 Locator Strategies

    • 使用 CSS_SELECTOR

      driver.get("http://printer_web_page")
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"frame[name='wlmframe']")))
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#toner[name='toner']")))
      
    • 使用 XPATH

      driver.get("http://printer_web_page")
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//frame[@name='wlmframe']")))
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='toner' and @name='toner']")))
      
  • 注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait
     from selenium.webdriver.common.by import By
     from selenium.webdriver.support import expected_conditions as EC
    

参考

您可以在以下位置找到一些相关讨论: