我正在尝试从Blog到Selenium废弃动态内容,但它总是会返回未呈现的JavaScript。
为了测试这种行为,我试着等到iframe完全加载并打印出内容,打印得很好但是当我回到父框架时它只显示未呈现的JavaScript。
我正在寻找能够打印完全呈现的HTML内容的内容
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
driver = webdriver.Chrome("path to chrome driver")
driver.get('http://justgivemechocolateandnobodygetshurt.blogspot.com/')
WebDriverWait(driver, 40).until(expected_conditions.frame_to_be_available_and_switch_to_it((By.ID, "navbar-iframe")))
# Rendered iframe HTML is printed.
content = driver.page_source
print content.encode("utf-8")
# When I switch back to parent frame it again prints non rendered JavaScript.
driver.switch_to.parent_frame()
content = driver.page_source
print content.encode("utf-8")
答案 0 :(得分:3)
问题是 - the .page_source
works only in the current context。有"current top-level browsing context"符号。意思是,如果您在默认内容上调用它 - 您将无法获得子iframe
元素的内部HTML - 因为您必须切换到上下文frame
并致电.page_source
。
换句话说,要获得包含iframe页面源的页面的完整HTML,您必须逐个切换到iframe上下文并单独获取源。
另见:
旧回答:
我会wait在获取page_source
之前至少要加载一个博客条目:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 40)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".entry-content")))
print(driver.page_source)