Selenium无法检索动态生成的html

时间:2019-11-12 05:48:30

标签: javascript python selenium web-scraping selenium-chromedriver

我想检索此html代码动态生成的svg内容:

index.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>BmHtmlGenerator</title>
</head>
<body>
<div id="svgContainer"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bodymovin/5.4.3/lottie.min.js"></script>
<script>
    let svgContainer =  window.bodymovin.loadAnimation({
        container: document.getElementById('svgContainer'),
        renderer: 'svg',
        loop: false,
        autoplay: false,
        path: 'https://labs.nearpod.com/bodymovin/demo/markus/isometric/markus2.json',

    });
</script>
</body>
</html>

通过阅读互联网上的许多帖子,我认为最好以这种方式将Selenium / Chromedriver与Django结合使用:

from bs4 import BeautifulSoup
from django.conf import settings
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium
from selenium.webdriver.support.ui import WebDriverWait


url = "myurl/index.html"
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument('--disable-dev-shm-usage')
browser = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver', options=options, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])

max_wait = 30
browser.set_page_load_timeout(max_wait)
browser.set_script_timeout(max_wait)

browser.get(url)
browser.implicitly_wait(30)
print(browser.page_source)
browser.close()
browser.quit()

但是它不起作用,print始终呈现html代码而不是生成的html代码。

我也尝试过:

wait = WebDriverWait(browser, timeout=30).until(lambda x: x.find_element_by_tag_name('svg'))
print(wait)
page_source = browser.page_source
print(page_source)

但是它总是抛出此错误:

 File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/core/handlers/exception.py", line 35, in inner
    response = get_response(request)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/core/handlers/base.py", line 128, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/core/handlers/base.py", line 126, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/django/views/generic/base.py", line 69, in view
    return self.dispatch(request, *args, **kwargs)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/rest_framework/views.py", line 494, in dispatch
    response = self.handle_exception(exc)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/rest_framework/views.py", line 454, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/rest_framework/views.py", line 491, in dispatch
    response = handler(request, *args, **kwargs)
  File "/vagrant/src/meshine_project/meshine_api/views.py", line 468, in post
    bm.create_file()
  File "/vagrant/src/meshine_project/meshine_api/HtmlFileGenerator/BmJsonGenerator.py", line 41, in create_file
    wait = WebDriverWait(browser, timeout=20).until(lambda x: x.find_element_by_tag_name('svg'))
  File "/home/vagrant/.virtualenvs/meshine_project/lib/python3.5/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

请帮助!

我只想遵循herehere的含义,但是没有任何作用。

0 个答案:

没有答案