无法通过HtmlUnit

时间:2018-04-05 16:09:59

标签: htmlunit

我是HtmlUnit的新手(使用版本2.30)。在Mac上使用Eclipse工作。我正在尝试通过登录我的Ameritrade帐户并操纵我在那里创建的监视列表来创建一个股票数据刮刀。首次登录表单将指向提出质询问题的两步安全页面。我不知道网站为什么/如何知道它想要首先挑战我的用户名/密码。因为它看起来像一个新的浏览器?

但无论如何,我在第二页填写表格并回答挑战问题并提交。它不再将我带到我帐户的主页,而是再次带我到两步安全页面,同时提出相同的挑战问题。以下是相关代码:

    final int sleepMinSeconds = 1;
    final int sleepRandomSeconds = 4;
    final long javascriptTimeout = 10000;

    System.out.println("HtmlUnitTest");

    String applicationName = "Mozilla";
    String applicationVersion = "5.0 (Windows NT 6.3; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0";
    final String userAgent = applicationName + "/" + applicationVersion;
    BrowserVersion browserVersion = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX_52)
          .setApplicationName(applicationName)
          .setApplicationVersion(applicationVersion)
          .setUserAgent(userAgent)
          .build();

    WebClient webClient = new WebClient(browserVersion);

    java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.ALL); 
    java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(java.util.logging.Level.ALL);
    webClient.setIncorrectnessListener(new com.gargoylesoftware.htmlunit.IncorrectnessListener() {
        @Override public void notify(String arg0, Object arg1) {} 
    });
    webClient.setJavaScriptErrorListener(new com.gargoylesoftware.htmlunit.javascript.JavaScriptErrorListener() {
        @Override public void timeoutError(HtmlPage arg0, long arg1, long arg2) {}   
        @Override public void scriptException(final HtmlPage arg0, final com.gargoylesoftware.htmlunit.ScriptException arg1) {} 
        @Override public void malformedScriptURL(HtmlPage arg0, String arg1, java.net.MalformedURLException arg2) {}
        @Override public void loadScriptError(HtmlPage arg0, java.net.URL arg1, Exception arg2) {}
    });
    webClient.setCssErrorHandler(new com.gargoylesoftware.htmlunit.SilentCssErrorHandler());
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setDoNotTrackEnabled(true);
    webClient.getOptions().setActiveXNative(true);
    webClient.getOptions().setRedirectEnabled(true);
    webClient.getOptions().setPrintContentOnFailingStatusCode(true);
    webClient.getCookieManager().setCookiesEnabled(true);
    webClient.getOptions().setDownloadImages(true);

    String loginURL = "https://www.tdameritrade.com/home.page";
    System.out.println("Connecting to " + loginURL + " (" + webClient.getBrowserVersion() + ")");
    System.out.print("    Waiting to avoid being detected as a robot...");
    Thread.sleep((long)(Math.random()*sleepRandomSeconds) * 1000);
    System.out.print("    Done waiting.\n");

    HtmlPage page = webClient.getPage(loginURL);
    System.out.println("title text: " + page.getTitleText());

    System.out.print("    \nWaiting for Javascript to complete...");
    webClient.waitForBackgroundJavaScript(javascriptTimeout);
    System.out.println("\nOK");

    System.out.print("    Waiting to avoid being detected as a robot...");
    Thread.sleep((long)(sleepMinSeconds + Math.random()*sleepRandomSeconds) * 1000);
    System.out.print("    Done waiting.\n");

    System.out.println("Logging in...");
    HtmlForm form = page.getFormByName("form-login");
    HtmlTextInput useridField = form.getInputByName("tbUsername");
    HtmlPasswordInput passwordField = form.getInputByName("tbPassword");
    useridField.type("<userid>");
    passwordField.type("<password>");
    HtmlButton button = form.getButtonByName("");
    System.out.println("button value: " + button.getValueAttribute());
    // Did this to make sure I had right button, which was unnamed.
    // Value is "Log in", so I proceed.

    HtmlPage page2 = button.click();

    System.out.print("    \nWaiting for Javascript to complete...");
    webClient.waitForBackgroundJavaScript(javascriptTimeout);
    System.out.println("\nOK");

    System.out.print("    Waiting to avoid being detected as a robot...");
    Thread.sleep((long)(sleepMinSeconds + Math.random()*sleepRandomSeconds) * 1000);
    System.out.print("    Done waiting.\n");

    HtmlElement element = page2.getHtmlElementById("loginBlock");
    HtmlForm form2 = element.getEnclosingForm();
    HtmlPasswordInput challengeField = form2.getInputByName("challengeAnswer");
    if(page2.asXml().contains("boss")) {
            System.out.println("boss question...");
            challengeField.type("<answer to boss question>");
    }
    else if(page2.asXml().contains("street")) {
            System.out.println("street question...");
            challengeField.type("<answer to street question>");
    }
    else {
            System.out.println("What?");
    }
    HtmlCheckBoxInput checkBox = form2.getInputByName("rememberDevice");
    checkBox.setChecked(true);
    HtmlInput button2 = form2.getInputByName("mAction");
    System.out.println("button2 value: " + button2.getValueAttribute());
    // value here is "submit" - so I proceed

    HtmlPage page3 = button2.click();
    System.out.print("    \nWaiting for Javascript to complete...");
    webClient.waitForBackgroundJavaScript(javascriptTimeout);
    System.out.println("\nOK");

    webClient.close();

换句话说,page2和page3是相同的,即两步安全页面。我希望page3成为我帐户的主页。 (我通过将它们写成XML来分隔文件来证实这一点。)我将非常感谢我能得到的任何帮助!谢谢!

1 个答案:

答案 0 :(得分:0)

好的,让我们开始对你的代码发表一些评论。

使用其他浏览器设置而不是使用默认构建版本,不确定要归档的内容。这没有任何问题,但请注意,更改浏览器设置不会对浏览器行为产生任何影响(例如支持的js功能)。

第二:如果你正在寻找bug /问题,我担心禁用所有听众是个坏主意。这个监听器输出可能值得... 关于所有选项:为什么不从默认设置开始,它真的接近真正的浏览器。

现在有关登录过程的一些话:

  1. 尝试理解实际应用程序的登录过程将是一个很大的帮助。所有这些“现代”Web应用程序都在做很多奇怪的事情(async / javascript)来模拟丰富的ui,而不是那个丰富的平台。像Charles WebProxy这样的工具真正有助于了解幕后进行的沟通。

  2. 这个HtmlPage page3 = button2.click()的一个常见问题; API是click方法返回点击的同步结果。如果按钮是这个花哨的Ajax按钮之一,这通常是按钮本身的页面。您已经在等待Ajax的完成,但如果有一个ajax重定向到新页面,页面将不会更改。在这种情况下,你必须在等待电话后做这样的事情。

    // there is an ajax redirect that loads a new page into this window
    page3 = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
    
  3. 希望有所帮助...