尝试使用HtmlUnit访问网站时出现PatternSyntaxException

时间:2015-05-14 12:32:24

标签: java html htmlunit

我目前正在尝试使用HtmlUnit访问Javacode中的网页。页面有一个Button,单击时会打开一个新页面。但是当我尝试单击它时,编译器抛出此异常,可以在附加的图像中找到。据我所知,它与页面的Html代码中的非法转义序列有关。

到目前为止,这是我的代码:

try(WebClient client = new WebClient(BrowserVersion.CHROME)){

    client.getOptions().setCssEnabled(false);
    WebRequest webRequest = new WebRequest(url);
    webRequest.setCharset("utf-8");
    HtmlPage entrypage = client.getPage(webRequest);
    HtmlInput dwnld = (HtmlInput) entrypage.getElementById("btn_download");

    long millis =  System.currentTimeMillis();

    while (System.currentTimeMillis() <= millis+11000) {
        //Do nothing, just wait 11 seconds
    }

    if (dwnld != null) {
        System.out.println("Found btn_download");
        dwnld.click();
    }


} catch (FailingHttpStatusCodeException | IOException e ) {
    // TODO Auto-generated catch block

    e.printStackTrace();
}

想点什么?

以下是例外:

java.util.regex.PatternSyntaxException: Illegal octal escape sequence near index 2
\0+$
  ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.o(Pattern.java:3192)
    at java.util.regex.Pattern.escape(Pattern.java:2300)
    at java.util.regex.Pattern.atom(Pattern.java:2198)
    at java.util.regex.Pattern.sequence(Pattern.java:2079)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.compile(Pattern.java:1696)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1054)
    at com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.doAction(HtmlUnitRegExpProxy.java:102)
    at com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.action(HtmlUnitRegExpProxy.java:74)
    at net.sourceforge.htmlunit.corejs.javascript.NativeString.execIdCall(NativeString.java:455)
    at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:89)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1531)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:708)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:982)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:351)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:411)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:276)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471)
    at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2110)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:875)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:962)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1327)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1270)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1218)
    at src.Hosts$1.exctractFileLinkFrom(Hosts.java:44)
    at src.TestMain.main(TestMain.java:10)

1 个答案:

答案 0 :(得分:0)

可能的解决方案?

这里的错误可能不在于框架Htmlparsing中。我的建议是,不是HtmlUnit框架本身无法解析非法转义序列,但它的记录器可能是。

我并不打算以这种方式解决问题,但当我将记录器级别更改为SEVERE以清理我的控制台输出时,不会抛出此类异常。

Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.SEVERE);
Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.SEVERE);

我的建议在这里是正确的还是仅仅是巧合?