我使用Selenium和启用了Javascript的HTMLUnit来阅读Python中的网站。不幸的是,我遇到了没有最干净的Javascript的网站的问题。例如:
from selenium import webdriver
try:
browser = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
browser.get('https://www.ebay.com/')
browser.close()
print('success')
except Exception as e:
print(e)
这会导致错误,就像python通过webdriver传递javascript错误一样。请注意,Chrome,Firefox或IE网络驱动程序不会发生这种情况。
例外e:
TypeError: Cannot read property "classList" from undefined (script in https://www.ebay.com/ from (46, 26) to (73, 78)#70)
Stacktrace:
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError (ScriptRuntime.java:4130)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError (ScriptRuntime.java:4108)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError (ScriptRuntime.java:4141)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2 (ScriptRuntime.java:4160)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.undefReadError (ScriptRuntime.java:4173)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp (ScriptRuntime.java:1528)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1245)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod (NativeArray.java:1671)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall (NativeArray.java:353)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call (IdFunctionObject.java:101)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1484)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod (NativeArray.java:1671)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall (NativeArray.java:353)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call (IdFunctionObject.java:101)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1484)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall (ContextFactory.java:417)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall (HtmlUnitContextFactory.java:325)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall (ScriptRuntime.java:3424)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec (InterpretedFunction.java:122)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun (JavaScriptEngine.java:781)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run (JavaScriptEngine.java:895)
at net.sourceforge.htmlunit.corejs.javascript.Context.call (Context.java:599)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call (ContextFactory.java:527)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:790)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:766)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:757)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript (HtmlPage.java:920)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded (HtmlScript.java:316)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded (HtmlScript.java:396)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute (HtmlScript.java:246)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage (HtmlScript.java:267)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement (HTMLParser.java:805)
at org.apache.xerces.parsers.AbstractSAXParser.endElement (None:-1)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement (HTMLParser.java:761)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement (HTMLTagBalancer.java:1236)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement (HTMLTagBalancer.java:1136)
at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement (DefaultFilter.java:226)
at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement (NamespaceBinder.java:345)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement (HTMLScanner.java:3178)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan (HTMLScanner.java:2141)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument (HTMLScanner.java:945)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse (HTMLConfiguration.java:521)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse (HTMLConfiguration.java:472)
at org.apache.xerces.parsers.XMLParser.parse (None:-1)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse (HTMLParser.java:1004)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse (HTMLParser.java:253)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml (HTMLParser.java:195)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage (DefaultPageCreator.java:267)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage (DefaultPageCreator.java:158)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto (WebClient.java:524)
at com.gargoylesoftware.htmlunit.WebClient.getPage (WebClient.java:398)
at com.gargoylesoftware.htmlunit.WebClient.getPage (WebClient.java:315)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.get (HtmlUnitDriver.java:670)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.lambda$get$8 (HtmlUnitDriver.java:657)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.lambda$runAsync$0 (HtmlUnitDriver.java:414)
at java.lang.Thread.run (None:-1)
我找到了以下适用于Java的内容:
WebClient client = new WebClient();
client.getOptions().setThrowExceptionOnScriptError(false);
我无法弄清楚如何在Python中实现这一点,任何建议?
答案 0 :(得分:1)
看起来自定义错误处理程序的实现解决了这个问题,例如:
from selenium import webdriver
from selenium.webdriver.remote.errorhandler import ErrorHandler
class MyHandler(ErrorHandler):
def check_response(self, response):
try:
super(MyHandler, self).check_response(response)
except Exception as e:
pass
try:
browser = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
browser.error_handler = MyHandler()
browser.get('https://www.ebay.com/')
browser.close()
print('success')
except Exception as e:
print(e)