我目前正在尝试使用HtmlUnit访问Javacode中的网页。页面有一个Button,单击时会打开一个新页面。但是当我尝试单击它时,编译器抛出此异常,可以在附加的图像中找到。据我所知,它与页面的Html代码中的非法转义序列有关。
到目前为止,这是我的代码:
try(WebClient client = new WebClient(BrowserVersion.CHROME)){
client.getOptions().setCssEnabled(false);
WebRequest webRequest = new WebRequest(url);
webRequest.setCharset("utf-8");
HtmlPage entrypage = client.getPage(webRequest);
HtmlInput dwnld = (HtmlInput) entrypage.getElementById("btn_download");
long millis = System.currentTimeMillis();
while (System.currentTimeMillis() <= millis+11000) {
//Do nothing, just wait 11 seconds
}
if (dwnld != null) {
System.out.println("Found btn_download");
dwnld.click();
}
} catch (FailingHttpStatusCodeException | IOException e ) {
// TODO Auto-generated catch block
e.printStackTrace();
}
想点什么?
以下是例外:
java.util.regex.PatternSyntaxException: Illegal octal escape sequence near index 2
\0+$
^
at java.util.regex.Pattern.error(Pattern.java:1955)
at java.util.regex.Pattern.o(Pattern.java:3192)
at java.util.regex.Pattern.escape(Pattern.java:2300)
at java.util.regex.Pattern.atom(Pattern.java:2198)
at java.util.regex.Pattern.sequence(Pattern.java:2079)
at java.util.regex.Pattern.expr(Pattern.java:1996)
at java.util.regex.Pattern.compile(Pattern.java:1696)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1054)
at com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.doAction(HtmlUnitRegExpProxy.java:102)
at com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.action(HtmlUnitRegExpProxy.java:74)
at net.sourceforge.htmlunit.corejs.javascript.NativeString.execIdCall(NativeString.java:455)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:89)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1531)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:708)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:982)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:351)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:411)
at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:276)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)
at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471)
at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2110)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:875)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:962)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1327)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1270)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1218)
at src.Hosts$1.exctractFileLinkFrom(Hosts.java:44)
at src.TestMain.main(TestMain.java:10)
答案 0 :(得分:0)
这里的错误可能不在于框架Htmlparsing中。我的建议是,不是HtmlUnit框架本身无法解析非法转义序列,但它的记录器可能是。
我并不打算以这种方式解决问题,但当我将记录器级别更改为SEVERE以清理我的控制台输出时,不会抛出此类异常。
Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.SEVERE);
Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.SEVERE);
我的建议在这里是正确的还是仅仅是巧合?