网站:http://www.sunat.gob.pe/cl-ti-itmrconsruc/jcrS00Alias 它有2帧,
其中一个采用POST形式:http://www.sunat.gob.pe/cl-ti-itmrconsruc/frameCriterioBusqueda.jsp
和另一帧显示结果:http://www.sunat.gob.pe/cl-ti-itmrconsruc/frameResultadoBusqueda.html
在Apache Netbeans中进行测试会向我发送错误消息:
--- exec-maven-plugin:1.5.0:exec(default-cli)@ htmunit --- 2019年9月30日8:39:15 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl通知 ADVERTENCIA:遇到过时的内容类型:'application / x-javascript'。 2019年9月30日8:39:21 com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl通知 ADVERTENCIA:遇到过时的内容类型:'application / x-javascript'。 2019年9月30日8:39:21 PM com.gargoylesoftware.htmlunit.javascript.DefaultJavaScriptErrorListener scriptException GRAVE:JavaScript执行期间出错 =======例外开始======== EcmaError:lineNumber = [0]列= [0] lineSource = [function(){] name = [TypeError] sourceName = [在http://www.sunat.gob.pe/cl-ti-itmrconsruc/jcrS00Alias]中加载HtmlBody []的事件message = [TypeError:无法调用方法“ goRefresh “ of undefined] com.gargoylesoftware.htmlunit.ScriptException:TypeError:无法调用未定义的方法“ goRefresh”
我的进步:
public static void main(String[] args) {
// TODO code application logic here
try {
String url = "http://www.sunat.gob.pe/cl-ti-itmrconsruc/frameCriterioBusqueda.jsp";
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
HtmlPage htmlpage = webClient.getPage(url);
//webClient.waitForBackgroundJavaScript(10000);
//CookieManager coo = webClient.getCookieManager();
//Cookie cookie = coo.getCookie("TS01c75c6f");
//System.out.println(cookie.getValue());
HtmlForm htmlForm = htmlpage.getElementByName("mainForm");
//htmlForm.setActionAttribute("jcrS00Alias");
HtmlTextInput input1 = htmlForm.getInputByName("search1");
HtmlTextInput input2 = htmlForm.getInputByName("codigo");
input1.setText("10468790497");
HtmlHiddenInput hidden = (HtmlHiddenInput)htmlForm.getInputByName("accion");
hidden.setValueAttribute("consPorRuc");
HtmlImage image = htmlpage.<HtmlImage>getFirstByXPath("//img[@src='captcha?accion=image']");
ImageReader img = image.getImageReader();
BufferedImage buf = img.read(0);
// Show image
ImageIcon icon = new ImageIcon(buf);
String codigo = JOptionPane.showInputDialog(null, icon, "Captcha image", JOptionPane.PLAIN_MESSAGE);
input2.setText(codigo);
HtmlButton boton = (HtmlButton) htmlpage.createElement("button");
boton.setAttribute("type", "submit");
htmlForm.appendChild(boton);
htmlpage = boton.click();
System.out.println(htmlpage.asXml().toString());
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
我期待返回成功的查询。
答案 0 :(得分:0)
让我们从一些通用的东西开始:
对于客户端的设置,请仅设置您真正需要的选项。默认情况下,其行为类似于真实的浏览器,无需例如启用Cookie。
有些警告您可以忽略
无法调用未定义的方法“ goRefresh”
这是因为框架中的js(在您的情况下为结果框架文档)试图从另一个框架中调用函数来更新验证码。但是您没有加载整个框架集-因此无法以预期的方式实现此功能。
要获得结果,您必须获取另一帧的内容。
此代码似乎有效:
// work with the whole frameset
String url = "http://www.sunat.gob.pe/cl-ti-itmrconsruc/jcrS00Alias";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
// do not stop in case of js errors
webClient.getOptions().setThrowExceptionOnScriptError(false);
HtmlPage frameset = webClient.getPage(url);
HtmlPage searchPage = (HtmlPage) frameset.getFrameByName("leftFrame").getEnclosedPage();
HtmlForm htmlForm = searchPage.getElementByName("mainForm");
// set search field
HtmlTextInput input1 = htmlForm.getInputByName("search1");
input1.setText("10468790497");
// process captcha
HtmlImage image = searchPage.<HtmlImage>getFirstByXPath("//img[@src='captcha?accion=image']");
ImageReader img = image.getImageReader();
BufferedImage buf = img.read(0);
ImageIcon icon = new ImageIcon(buf);
String codigo = JOptionPane.showInputDialog(null, icon, "Captcha image", JOptionPane.PLAIN_MESSAGE);
HtmlTextInput input2 = htmlForm.getInputByName("codigo");
input2.setText(codigo);
// click the button
HtmlElement boton = htmlForm.getElementsByAttribute("input", "value", "Buscar").get(0);
boton.click();
// and get the result
HtmlPage resultPage = (HtmlPage) frameset.getFrameByName("mainFrame").getEnclosedPage();
System.out.println(resultPage.asText());
}
但是您需要最新的HtmlUnit快照来运行此快照,而不会出现js错误(十年前引入了一个奇怪的错误),该错误现已修复。
希望有帮助。