HtmlUnit没有getPage

时间:2016-05-11 21:44:30

标签: java android htmlunit

使用HtmlUnit从互联网上抓取数据,我需要登录以下页面https://accounts.google.com/login

当使用“getPage()”方法时,我不断得到这个例外,我该如何解决?提前谢谢你

  Exception in thread "main" ======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: AssertionError: Assertion failed: No element found with className: signin-card (script in https://accounts.google.com/login?hl=es#identifier from (2653, 11) to (2753, 10)#2660)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:894)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:776)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:752)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:740)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:916)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:307)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:368)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:238)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:257)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:773)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:730)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1209)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1111)
    at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:207)
    at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:337)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3137)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2100)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:927)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:506)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:459)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:980)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:269)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:157)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:512)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:386)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:304)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:451)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:436)
    at prog.htmlUnit.Scrapeo.iniciaSesion(Scrapeo.java:74)
    at prog.htmlUnit.ProgramaPruebas.main(ProgramaPruebas.java:24)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: [object Object] (script in https://accounts.google.com/login?hl=es#identifier from (2653, 11) to (2753, 10)#2660)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1006)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:767)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:879)
    ... 35 more
JavaScriptException value = [object Object]
======= EXCEPTION END ========

抛出异常的部分就像这样简单:

public HtmlPage iniciaSesion(String correo, String pass) throws FailingHttpStatusCodeException, MalformedURLException, IOException{
        HtmlPage pagActual;
        HtmlTextInput cajaTexto;
        HtmlButton boton;

        pagActual= cliente.getPage("https://accounts.google.com/login?hl=es#identifier");

        return pagActual;

主程序只调用此方法并使用.asXml()方法,但在使用之前会抛出异常。

1 个答案:

答案 0 :(得分:4)

您需要在客户端上启用Javascript。 这段代码应该有效:

LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); 
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);

String url = "https://accounts.google.com/login";
final HtmlPage page = client.getPage(url);

System.out.println(page.asText());