Question

我正在浏览Google Play商店和某个应用的评论，这些评论由应用页面的URL指定。然后，Selenium找到评论并向下滚动以加载所有评论。滚动部分有效，没有无头选项，我可以看到Selenium到达站点的末端。无法正常工作的是保存html内容以进行进一步分析。

基于其他答案，我尝试了其他方法来保存源代码。

   url_list = []
   while int(startdate)< int(enddate):
        textstring=str(startdate) + "/" + str(int(startdate)+1)
        print(textstring)
        driver.find_element_by_xpath("//select[@name='season_id']/option[text()='" + textstring +"']").click()
        startdate=int(startdate)+1
        url_list.append(cycle_through_game_weeks(driver))
   return url_list

或

innerHTML = DRIVER.execute_script("return document.body.innerHTML")

两者都会导致相同的错误消息和异常。

我的用于滚动浏览页面并加载所有评论的代码

innerHTML = DRIVER.page_source

日志文件，显示无穷大滚动到达页面末尾，但无法保存文件

SCROLL_PAUSE_TIME = 5
options = Options()
options.headless = True
FP = webdriver.FirefoxProfile()
FP.set_preference("intl.accept_languages", "de")

for url in START_URLS:

    try:
        DRIVER = webdriver.Firefox(options=options, firefox_profile=FP)
        DRIVER.get(url)
        time.sleep(SCROLL_PAUSE_TIME)
        app_name = DRIVER.find_element_by_xpath('//h1[@itemprop="name"]').get_attribute('innerText')
        all_reviews_button = DRIVER.find_element_by_xpath('//span[text()="Alle Bewertungen lesen"]')
        all_reviews_button.click()
        time.sleep(SCROLL_PAUSE_TIME)
        last_height = DRIVER.execute_script("return document.body.scrollHeight")
        while True:
            DRIVER.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            try:
                DRIVER.find_element_by_xpath('//span[text()="Mehr anzeigen"]').click()
            except:
                pass
            time.sleep(SCROLL_PAUSE_TIME)
            new_height = DRIVER.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                logger.info('Durchlauf erfolgreich')
                innerHTML = DRIVER.execute_script("return document.body.innerHTML")
                with open(app_name +'.html','w', encoding='utf-8') as out:
                   out.write(html)
                break
            last_height = new_height

    except Exception as e:
        logger.error('Exception occurred', exc_info=True)
    finally:
        DRIVER.quit()

geckodriver.log的最后一部分

10.09.19 16:12:00 - INFO - Durchlauf erfolgreich
10.09.19 16:12:13 - ERROR - Exception occurred
Traceback (most recent call last):
  File "scraper.py", line 57, in <module>
    innerHTML = DRIVER.execute_script("return document.body.innerHTML")
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275"  data: no]

我想将页面另存为文件，并在下一步中解析html以提取评论。但是，保存部分无法处理较大的页面。如果我说了100步后退出了while循环并保存了页面，那么它将正常工作。

Answer 1

NS_ERROR_FAILURE (0x80004005)

这是所有错误的一般错误，对于所有不适用更特定错误代码的错误都会发生。

但是此错误消息...

selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275"  data: no]

...表示牵线木偶在尝试读取/存储/复制page_source()时抛出了错误。

相关的HTML DOM本可以帮助我们以更好的方式调试问题。但是，问题似乎在于page_source()确实很大/很大，超过了牵线木偶可以处理的最大值。可能是您要处理的string更大。

解决方案

一种快速的解决方案是避免将page_source()传递给变量并将其打印出来以找出实际问题所在。

print(DRIVER.execute_script("return document.body.innerHTML"))

或

print(DRIVER.page_source)

Outro

您可以在以下位置找到一些相关的讨论

WebDriverException：消息：[Exception ...“ Failure” nsresult：“ 0x80004005（NS_ERROR_FAILURE）”，同时使用Selenium Python保存了较大的html文件

1 个答案:

NS_ERROR_FAILURE (0x80004005)

解决方案

Outro