使用HtmlUnit

时间:2016-07-11 23:40:39

标签: java parsing web-scraping htmlunit

我正在尝试通过HtmlUnit登录Google Acccount,但仍然出现问题,我正在登录页面。我做错了什么?

  1. 设置电子邮件
  2. 点击下一步按钮
  3. 设置密码
  4. 点击登录按钮
  5. 转到GMail页面,它仍然是登录页面(下面的输出)
  6. 我的示例代码:

            WebClient client = new WebClient(BrowserVersion.CHROME);
            client.setHTMLParserListener(HTMLParserListener.LOG_REPORTER);
            client.setJavaScriptEngine(new JavaScriptEngine(client));
            client.getOptions().setJavaScriptEnabled(true);
            client.getCookieManager().setCookiesEnabled(true);
            client.getOptions().setThrowExceptionOnScriptError(false);
            client.getOptions().setThrowExceptionOnFailingStatusCode(false);
            client.setAjaxController(new NicelyResynchronizingAjaxController());
            client.getCache().setMaxSize(0);
            client.getOptions().setRedirectEnabled(true);
    
            String url = "https://accounts.google.com/login?hl=en#identifier";
            HtmlPage loginPage = client.getPage(url);
            client.waitForBackgroundJavaScript(1000000);
    
            HtmlForm loginForm = loginPage.getFirstByXPath("//form[@id='gaia_loginform']");
            List<HtmlInput> buttonInputs = loginForm.getInputsByValue("signIn");
            HtmlInput nextButton = Iterables.getFirst(buttonInputs, null);
            HtmlInput loginButton = Iterables.getLast(buttonInputs);
            Thread.sleep(2000);
    
            //setup email
            HtmlInput emailInput = loginForm.getInputByName("Email");
            emailInput.setValueAttribute(emailAddress);
            Thread.sleep(2000);
    
            //click next button
            nextButton.click();
            client.waitForBackgroundJavaScript(1000000);
            Thread.sleep(2000);
    
            //setup password
            HtmlInput passwordInput = loginForm.getInputByName("Passwd");
            passwordInput.setValueAttribute(password);
    
            //click login button
            loginButton.click();
            client.waitForBackgroundJavaScript(1000000);
            Thread.sleep(2000);
    
            HtmlPage gmailPage = client.getPage("https://mail.google.com/mail/u/0/#inbox");
            log.info(gmailPage.asText());
    

    毕竟我得到输出

    2016-07-12 01:36:47 INFO  GoogleAccountClient:91 - Gmail
    
    One account. All of Google.
     Sign in to continue to Gmail
    
     Next Need help?
    
    Sign inchecked
    
    Create account
     One Google Account for everything Google
    
    About Google
     Privacy
     Terms
     Help
    
    ‪English (United States)‬
    
     identifier
    
    我忘记了一些明显的东西吗?

    我也尝试通过javascript点击按钮

    loginPage.executeJavaScript("document.getElementById('next').click()");
    loginPage.executeJavaScript("document.getElementById('signIn').click()");
    

1 个答案:

答案 0 :(得分:0)

我认为你做错了当你正在寻找&#34;下一个&#34;并且&#34;登录&#34;的按钮。

我非常确定你需要这样做: List<HtmlInput> buttonInputs = loginForm.getInputsByName("signIn");

相反,你放了 List<HtmlInput> buttonInputs = loginForm.getInputsByValue("signIn");

这是不对的,因为两个按钮都具有相同的名称&#34; signIn&#34; 当&#34;下一个按钮&#34;有一个的&#34;下一个&#34;并且&#34;登录&#34;按钮的值为&#34;登录&#34;。