Java - 下载https页面

时间:2016-01-12 19:38:22

标签: java https

我正在尝试使用此代码下载网页内容,但它与Firefox不同。

URL url = new URL("https://jumpseller.cl/support/webpayplus/");
InputStream is = url.openStream();
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);

当我检查/tmp/asdfasdf时,它不是页面的html源代码,而只是字节(没有文本)。但是,在Firefox中,我可以看到网页及其源代码

如何获得真实的网页?

2 个答案:

答案 0 :(得分:0)

您需要检查响应标头。页面已压缩。 Content-Encoding标头的值为gzip

试试这个:

URL url = new URL("https://jumpseller.cl/support/webpayplus/");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

if ("gzip".equals(conn.getContentEncoding())) {
    is = new GZIPInputStream(is);
}

Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);

答案 1 :(得分:0)

使用HtmlUnit库和此代码:

    try(final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
        java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setUseInsecureSSL(true);
        webClient.waitForBackgroundJavaScript(5 * 1000);         
        HtmlPage page = webClient.getPage("https://jumpseller.cl/support/webpayplus/");
        String stringToSave = page.asXml(); // It's a string with full HTML-code, if need you can save it to file.
        webClient.close();  
    }