如何区分pdf和非pdf文件?

时间:2013-11-15 14:27:45

标签: java pdf infinite-loop urlconnection

我使用以下代码段下载pdf文件(我从here获取,信用额度为Josh M

public final class FileDownloader {

    private FileDownloader(){}

    public static void main(String args[]) throws IOException{
        download("http://pdfobject.com/pdf/sample.pdf", new File("sample.pdf"));
    }

    public static void download(final String url, final File destination) throws IOException {
        final URLConnection connection = new URL(url).openConnection();
        connection.setConnectTimeout(60000);
        connection.setReadTimeout(60000);
        connection.addRequestProperty("User-Agent", "Mozilla/5.0");
        final FileOutputStream output = new FileOutputStream(destination, false);
        final byte[] buffer = new byte[2048];
        int read;
        final InputStream input = connection.getInputStream();
        while((read = input.read(buffer)) > -1)
            output.write(buffer, 0, read);
        output.flush();
        output.close();
        input.close();
    }
}

它与pdf文件完美配合。但是,当我遇到“坏文件”时......我不知道该文件的扩展名是什么,但似乎我陷入while((read = input.read(buffer)) > -1)的无限循环。如何改进此片段以丢弃任何类型的不适当文件(非pdf)?

1 个答案:

答案 0 :(得分:2)

对于类似问题存在疑问:Infinite Loop in Input Stream

查看可能的解决方案:Abort loop after fixed time

您可以尝试为连接设置超时:Java URLConnection Timeout