刮取使用javascript插入的iframe数据表

时间:2019-09-27 15:39:22

标签: python python-3.x selenium selenium-webdriver web-scraping

尝试在动态加载的iframe中抓取数据表所在的网站。该URL永远不会改变,我已经使用硒来导航到表。但是一旦到达那里,它仍然找不到ID为“ theiframe”的iframe。我看到使用检查元素的iframe,但是当我使用脚本时却找不到它。

我尝试通过xpath public class ArrayPool<T> { public int Size { get => pool.Count(); } public int maxSize = 3; public int circulingObjectCount = 0; private Queue<T> pool = new Queue<T>(); private Func<T> constructorFunc; public ArrayPool(Func<T> constructorFunc) { this.constructorFunc = constructorFunc; } public Task Use(Func<T, Task> action) { T item = GetNextItem(); //DeQueue the item var t = action(item); t.ContinueWith(task => pool.Enqueue(item)); //Requeue the item return t; } private T GetNextItem() { //Create new object if pool is empty and not reached maxSize if (pool.Count == 0 && circulingObjectCount < maxSize) { T item = constructorFunc(); circulingObjectCount++; Console.WriteLine("Pool empty, adding new item"); return item; } //Wait for Queue to have at least 1 item WaitForReturns(); return pool.Dequeue(); } private void WaitForReturns() { long timeouts = 60000; while (pool.Count == 0 && timeouts > 0) { timeouts--; System.Threading.Thread.Sleep(1); } if(timeouts == 0) { throw new Exception("Wait timed-out"); } } } 和CSS选择器(("//iframe[@id='theiframe']")定位iframe进行抓取。仍然收到一条消息,提示它找不到元素

("theiframe")

1 个答案:

答案 0 :(得分:1)

页面上存在嵌套的iframe。您需要同时切换到iframes

框架1 :ID ='interiorFrame'

框架2 :ID ='theiframe'

诱导WebDriverWaitframe_to_be_available_and_switch_to_it()两帧。

尝试下面的代码。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://www.1line.williams.com/Transco/index.html")
action = ActionChains(driver)
step1=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[4]/ul[1]/li[5]/a[1]")))
action.move_to_element(step1).perform()
step2=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[4]/ul[1]/li[5]/ul[1]/li[1]/a[1]")))
action.move_to_element(step2).click(step2).perform()
WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,'interiorFrame')))
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,'theiframe')))
page_source=driver.page_source
print(page_source)