下载.zip文件

时间:2014-10-31 01:47:33

标签: htmlunit

我正在尝试使用htmlunit-2.13下载.zip文件。

从网络上可以下载文件:
- http://www.bmfbovespa.com.br/consulta-isin/BuscaCodigosIsin.aspx?Idioma=pt-br
- 点击“下载de Arquivos”链接 - 然后链接'Banco de Dados Completo'

以下java编码保存html文件而不是.zip文件

public class Teste {

    public static void main(String args[])
    {
        try
        {
            LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
            java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.ALL);

            HtmlPage page = null;
            String url = "http://www.bmfbovespa.com.br/consulta-isin/BuscaCodigosIsin.aspx?Idioma=pt-br";

            @SuppressWarnings("deprecation")
            WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_10);
            webClient.getOptions().setThrowExceptionOnScriptError(false);
            webClient.getOptions().setJavaScriptEnabled(true);

            WebRequest webRequest = new WebRequest(new URL(url));
            webRequest.setCharset("UTF-8");

            page = webClient.getPage( webRequest );

            HtmlAnchor anchor1 = (HtmlAnchor) page.getElementById("ctl00_contentPlaceHolderConteudo_hplCompleto");
            HtmlPage page2 = anchor1.click();

            InputStream is = page2.getWebResponse().getContentAsStream();
            FileOutputStream output = new FileOutputStream("/tmp/isin.zip");

            IOUtils.copy(is, output);
            output.close();

            System.out.println("New file created!");            
        }
        catch ( FailingHttpStatusCodeException e1 )
        {
            System.out.println( "FailingHttpStatusCodeException thrown:" + e1.getMessage() );
            e1.printStackTrace();

        }
        catch ( MalformedURLException e1 )
        {
            System.out.println( "MalformedURLException thrown:" + e1.getMessage() );
            e1.printStackTrace();

        }
        catch ( IOException e1 )
        {
            System.out.println( "IOException thrown:" + e1.getMessage() );
            e1.printStackTrace();

        }
        catch( Exception e )
        {
            System.out.println( "General exception thrown:" + e.getMessage() );
            e.printStackTrace();

        }
    }

}

1 个答案:

答案 0 :(得分:0)

我在尝试从这样的页面获取zip时遇到了类似的问题。这是我的解决方案:

WebClient wc = new WebClient(BrowserVersion.CHROME);

HtmlPage p = wc.getPage(url)

// I believe clicking the element returns an UnexpectedPage 
// because the content type is 'application/zip'

UnexpectedPage up = p.getElementById(buttonId).click();

InputStream in = up.getInputStream();

...