我是HtmlUnit的新手(使用版本2.30)。在Mac上使用Eclipse工作。我正在尝试通过登录我的Ameritrade帐户并操纵我在那里创建的监视列表来创建一个股票数据刮刀。首次登录表单将指向提出质询问题的两步安全页面。我不知道网站为什么/如何知道它想要首先挑战我的用户名/密码。因为它看起来像一个新的浏览器?
但无论如何,我在第二页填写表格并回答挑战问题并提交。它不再将我带到我帐户的主页,而是再次带我到两步安全页面,同时提出相同的挑战问题。以下是相关代码:
final int sleepMinSeconds = 1;
final int sleepRandomSeconds = 4;
final long javascriptTimeout = 10000;
System.out.println("HtmlUnitTest");
String applicationName = "Mozilla";
String applicationVersion = "5.0 (Windows NT 6.3; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0";
final String userAgent = applicationName + "/" + applicationVersion;
BrowserVersion browserVersion = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX_52)
.setApplicationName(applicationName)
.setApplicationVersion(applicationVersion)
.setUserAgent(userAgent)
.build();
WebClient webClient = new WebClient(browserVersion);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.ALL);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(java.util.logging.Level.ALL);
webClient.setIncorrectnessListener(new com.gargoylesoftware.htmlunit.IncorrectnessListener() {
@Override public void notify(String arg0, Object arg1) {}
});
webClient.setJavaScriptErrorListener(new com.gargoylesoftware.htmlunit.javascript.JavaScriptErrorListener() {
@Override public void timeoutError(HtmlPage arg0, long arg1, long arg2) {}
@Override public void scriptException(final HtmlPage arg0, final com.gargoylesoftware.htmlunit.ScriptException arg1) {}
@Override public void malformedScriptURL(HtmlPage arg0, String arg1, java.net.MalformedURLException arg2) {}
@Override public void loadScriptError(HtmlPage arg0, java.net.URL arg1, Exception arg2) {}
});
webClient.setCssErrorHandler(new com.gargoylesoftware.htmlunit.SilentCssErrorHandler());
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setDoNotTrackEnabled(true);
webClient.getOptions().setActiveXNative(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(true);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.getOptions().setDownloadImages(true);
String loginURL = "https://www.tdameritrade.com/home.page";
System.out.println("Connecting to " + loginURL + " (" + webClient.getBrowserVersion() + ")");
System.out.print(" Waiting to avoid being detected as a robot...");
Thread.sleep((long)(Math.random()*sleepRandomSeconds) * 1000);
System.out.print(" Done waiting.\n");
HtmlPage page = webClient.getPage(loginURL);
System.out.println("title text: " + page.getTitleText());
System.out.print(" \nWaiting for Javascript to complete...");
webClient.waitForBackgroundJavaScript(javascriptTimeout);
System.out.println("\nOK");
System.out.print(" Waiting to avoid being detected as a robot...");
Thread.sleep((long)(sleepMinSeconds + Math.random()*sleepRandomSeconds) * 1000);
System.out.print(" Done waiting.\n");
System.out.println("Logging in...");
HtmlForm form = page.getFormByName("form-login");
HtmlTextInput useridField = form.getInputByName("tbUsername");
HtmlPasswordInput passwordField = form.getInputByName("tbPassword");
useridField.type("<userid>");
passwordField.type("<password>");
HtmlButton button = form.getButtonByName("");
System.out.println("button value: " + button.getValueAttribute());
// Did this to make sure I had right button, which was unnamed.
// Value is "Log in", so I proceed.
HtmlPage page2 = button.click();
System.out.print(" \nWaiting for Javascript to complete...");
webClient.waitForBackgroundJavaScript(javascriptTimeout);
System.out.println("\nOK");
System.out.print(" Waiting to avoid being detected as a robot...");
Thread.sleep((long)(sleepMinSeconds + Math.random()*sleepRandomSeconds) * 1000);
System.out.print(" Done waiting.\n");
HtmlElement element = page2.getHtmlElementById("loginBlock");
HtmlForm form2 = element.getEnclosingForm();
HtmlPasswordInput challengeField = form2.getInputByName("challengeAnswer");
if(page2.asXml().contains("boss")) {
System.out.println("boss question...");
challengeField.type("<answer to boss question>");
}
else if(page2.asXml().contains("street")) {
System.out.println("street question...");
challengeField.type("<answer to street question>");
}
else {
System.out.println("What?");
}
HtmlCheckBoxInput checkBox = form2.getInputByName("rememberDevice");
checkBox.setChecked(true);
HtmlInput button2 = form2.getInputByName("mAction");
System.out.println("button2 value: " + button2.getValueAttribute());
// value here is "submit" - so I proceed
HtmlPage page3 = button2.click();
System.out.print(" \nWaiting for Javascript to complete...");
webClient.waitForBackgroundJavaScript(javascriptTimeout);
System.out.println("\nOK");
webClient.close();
换句话说,page2和page3是相同的,即两步安全页面。我希望page3成为我帐户的主页。 (我通过将它们写成XML来分隔文件来证实这一点。)我将非常感谢我能得到的任何帮助!谢谢!
答案 0 :(得分:0)
好的,让我们开始对你的代码发表一些评论。
使用其他浏览器设置而不是使用默认构建版本,不确定要归档的内容。这没有任何问题,但请注意,更改浏览器设置不会对浏览器行为产生任何影响(例如支持的js功能)。
第二:如果你正在寻找bug /问题,我担心禁用所有听众是个坏主意。这个监听器输出可能值得... 关于所有选项:为什么不从默认设置开始,它真的接近真正的浏览器。
现在有关登录过程的一些话:
尝试理解实际应用程序的登录过程将是一个很大的帮助。所有这些“现代”Web应用程序都在做很多奇怪的事情(async / javascript)来模拟丰富的ui,而不是那个丰富的平台。像Charles WebProxy这样的工具真正有助于了解幕后进行的沟通。
这个HtmlPage page3 = button2.click()的一个常见问题; API是click方法返回点击的同步结果。如果按钮是这个花哨的Ajax按钮之一,这通常是按钮本身的页面。您已经在等待Ajax的完成,但如果有一个ajax重定向到新页面,页面将不会更改。在这种情况下,你必须在等待电话后做这样的事情。
// there is an ajax redirect that loads a new page into this window
page3 = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
希望有所帮助...