我正在尝试使用htmlunit-2.13下载.zip文件。
从网络上可以下载文件:
- http://www.bmfbovespa.com.br/consulta-isin/BuscaCodigosIsin.aspx?Idioma=pt-br
- 点击“下载de Arquivos”链接
- 然后链接'Banco de Dados Completo'
以下java编码保存html文件而不是.zip文件
public class Teste {
public static void main(String args[])
{
try
{
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.ALL);
HtmlPage page = null;
String url = "http://www.bmfbovespa.com.br/consulta-isin/BuscaCodigosIsin.aspx?Idioma=pt-br";
@SuppressWarnings("deprecation")
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_10);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
WebRequest webRequest = new WebRequest(new URL(url));
webRequest.setCharset("UTF-8");
page = webClient.getPage( webRequest );
HtmlAnchor anchor1 = (HtmlAnchor) page.getElementById("ctl00_contentPlaceHolderConteudo_hplCompleto");
HtmlPage page2 = anchor1.click();
InputStream is = page2.getWebResponse().getContentAsStream();
FileOutputStream output = new FileOutputStream("/tmp/isin.zip");
IOUtils.copy(is, output);
output.close();
System.out.println("New file created!");
}
catch ( FailingHttpStatusCodeException e1 )
{
System.out.println( "FailingHttpStatusCodeException thrown:" + e1.getMessage() );
e1.printStackTrace();
}
catch ( MalformedURLException e1 )
{
System.out.println( "MalformedURLException thrown:" + e1.getMessage() );
e1.printStackTrace();
}
catch ( IOException e1 )
{
System.out.println( "IOException thrown:" + e1.getMessage() );
e1.printStackTrace();
}
catch( Exception e )
{
System.out.println( "General exception thrown:" + e.getMessage() );
e.printStackTrace();
}
}
}
答案 0 :(得分:0)
我在尝试从这样的页面获取zip时遇到了类似的问题。这是我的解决方案:
WebClient wc = new WebClient(BrowserVersion.CHROME);
HtmlPage p = wc.getPage(url)
// I believe clicking the element returns an UnexpectedPage
// because the content type is 'application/zip'
UnexpectedPage up = p.getElementById(buttonId).click();
InputStream in = up.getInputStream();
...