HtmlUnit是一个非常棒的Java库,允许您以编程方式填写和提交Web表单。我目前正在维护一个用ASP编写的旧系统,而不是按照我的要求每月手动填写这一个Web表单,我试图找到一种方法来自动完成整个任务,因为我保持忘了它。它是一种用于检索一个月内收集的数据的表单。这是我到目前为止编码的内容:
WebClient client = new WebClient();
HtmlPage page = client.getPage("http://urlOfTheWebsite.com/search.aspx");
HtmlForm form = page.getFormByName("aspnetForm");
HtmlSelect frMonth = form.getSelectByName("ctl00$cphContent$ddlStartMonth");
HtmlSelect frDay = form.getSelectByName("ctl00$cphContent$ddlStartDay");
HtmlSelect frYear = form.getSelectByName("ctl00$cphContent$ddlStartYear");
HtmlSelect toMonth = form.getSelectByName("ctl00$cphContent$ddlEndMonth");
HtmlSelect toDay = form.getSelectByName("ctl00$cphContent$ddlEndDay");
HtmlSelect toYear = form.getSelectByName("ctl00$cphContent$ddlEndYear");
HtmlCheckBoxInput games = form.getInputByName("ctl00$cphContent$chkListLottoGame$0");
HtmlSubmitInput submit = form.getInputByName("ctl00$cphContent$btnSearch");
frMonth.setSelectedAttribute("1", true);
frDay.setSelectedAttribute("1", true);
frYear.setSelectedAttribute("2012", true);
toMonth.setSelectedAttribute("1", true);
toDay.setSelectedAttribute("31", true);
toYear.setSelectedAttribute("2012", true);
games.setChecked(true);
submit.click();
在click()
之后,我应该等待同一个网页完成重新加载,因为某处有一个表格显示我的搜索结果。然后,当页面加载完成后,我需要将其作为HTML文件下载(非常类似于您最喜爱的浏览器中的“保存页面...”),因为我将清除数据以计算总数,而且我已经使用Jsoup库完成了这项工作。
我的问题是: 1.如何以编程方式等待网页在HtmlUnit中完成加载? 2.如何以编程方式将生成的网页下载为HTML文件?
我已经查看了HtmlUnit文档,找不到能满足我需要的类。
答案 0 :(得分:6)
尝试使用以下设置:
webClient.waitForBackgroundJavaScript() or
webClient.waitForBackgroundJavaScriptStartingBefore()
我认为您还需要提及浏览器。默认情况下它使用IE.You将从这里获得更多信息。 HTMLUnit doesn't wait for Javascript
答案 1 :(得分:0)
如何以编程方式将生成的网页下载为HTML文件
试试asXml()
。类似的东西:
page = submit.click();
String htmlContent = page.asXml();
File htmlFile = new File("C:/index.html");
PrintWriter pw = new PrintWriter(htmlFile, true);
pw.print(htmlContent);
pw.close();
答案 2 :(得分:0)
此示例可能对您有所帮助。单击后,您需要等待页面加载。大多数情况下它是一个使用java脚本等的动态页面。所有重写的方法都不会让你用大量的控制台消息来压倒你。你可以实现你想要的那个。
public static void main(String[] args) throws IOException {
WebClient webClient = gethtmlUnitClient();
final HtmlPage page = webClient.getPage("YOUR PAGE");
webClient.waitForBackgroundJavaScript(60000);
System.out.println(page);
}
static public WebClient gethtmlUnitClient() {
WebClient webClient;
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log",
"org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
webClient = new WebClient(BrowserVersion.CHROME);
webClient.setIncorrectnessListener(new IncorrectnessListener() {
@Override
public void notify(String arg0, Object arg1) {
}
});
webClient.setCssErrorHandler(new ErrorHandler() {
@Override
public void warning(CSSParseException arg0) throws CSSException {
// TODO Auto-generated method stub
}
@Override
public void fatalError(CSSParseException arg0) throws CSSException {
// TODO Auto-generated method stub
}
@Override
public void error(CSSParseException arg0) throws CSSException {
// TODO Auto-generated method stub
}
});
webClient.setJavaScriptErrorListener(new JavaScriptErrorListener() {
@Override
public void timeoutError(HtmlPage arg0, long arg1, long arg2) {
// TODO Auto-generated method stub
}
@Override
public void scriptException(HtmlPage arg0, ScriptException arg1) {
// TODO Auto-generated method stub
}
@Override
public void malformedScriptURL(HtmlPage arg0, String arg1, MalformedURLException arg2) {
// TODO Auto-generated method stub
}
@Override
public void loadScriptError(HtmlPage arg0, URL arg1, Exception arg2) {
// TODO Auto-generated method stub
}
});
webClient.setHTMLParserListener(new HTMLParserListener() {
@Override
public void warning(String arg0, URL arg1, String arg2, int arg3, int arg4, String arg5) {
// TODO Auto-generated method stub
}
@Override
public void error(String arg0, URL arg1, String arg2, int arg3, int arg4, String arg5) {
// TODO Auto-generated method stub
}
});
webClient.getOptions().setThrowExceptionOnScriptError(false);
return webClient;
}