如何下载嵌入网站的文件?

时间:2015-03-20 23:54:20

标签: java httpurlconnection

目前我只能下载具有此类格式的文件:

  

https://jdbc.postgresql.org/download/postgresql-8.1-415.jdbc2.jar

但是如何下载在url文件中看不到的文件? 例如Skype的网址:

  

http://www.skype.com/sv/download-skype/skype-for-mac/downloading/

正如大家们所见,我无法使用

下载文件
filePath.subString(filePath.lastIndexOf("/") + 1);

还有其他方法吗?我确实使用FireBug找到了嵌入页面的文件,这是

  

http://www.skype.com/go/getskype-macosx.dmg

我的问题是,我可以通过编程方式浏览该页面并访问此文件吗?

以下是可以正常下载的代码

public static void fileDownload(String urlFile) throws IOException {
    URL url = new URL(urlFile);
    HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();
    int responseCode = httpURLConnection.getResponseCode();
    if (responseCode == HttpURLConnection.HTTP_OK) {
        String fileName = "";
        String disposition = httpURLConnection.getHeaderField("Content-Disposition");
        String contentType = httpURLConnection.getContentType();
        int contentLength = httpURLConnection.getContentLength();
        if (disposition != null) {
            int index = disposition.indexOf("filename=");
            if (index > 0) {
                fileName = disposition.substring(index + 10, disposition.length() - 1);
            }
        } else {
            fileName = urlFile.substring(urlFile.lastIndexOf("/") + 1, urlFile.length());
        }
        System.out.println("Content-type= " + contentType);
        System.out.println("Disposition= " + disposition);
        System.out.println("Content-length= " + contentLength);
        System.out.println("File name= " + fileName);
        InputStream inputStream = httpURLConnection.getInputStream();
        String saveFilePath = getDesiredPath() + File.separator + fileName;
        FileOutputStream fileOutputStream = new FileOutputStream(saveFilePath);
        int byteRead = -1;
        byte[] buffer = new byte[BUFFER_SIZE];
        while ((byteRead = inputStream.read(buffer)) != -1) {
            fileOutputStream.write(buffer, 0, byteRead);
        }
        fileOutputStream.close();
        inputStream.close();
        System.out.println("File downloaded");
    } else {
        System.out.println("No file to download. Server replied httpCode=" + responseCode);
    }

    httpURLConnection.disconnect();

}

这是我第一次使用文件管理,这段代码实际上取自here

1 个答案:

答案 0 :(得分:0)

如果文件下载链接嵌入页面,您可以下载该文件。

网页html中的内容如下:

. . .

<a href="Skype.exe">Download Skype</a>

. . .

要下载页面并扫描链接,您可以使用JSoup

代码可能如下所示:

Document doc = Jsoup.connect("http://example.com/").get();
Elements anchors = doc.select("a");

// Untested code

for (var anchor of anchors) // ECMA 6 (i think)
{
    if (anchor.href.endsWith(".exe")
    {
        // if href is not full url i.e. not starting with http://  
        var downloadLink = url + anchor.href;
        // Download the file with the about url
    }
}