如何检测URL的真实文件类型?

时间:2016-06-05 05:52:13

标签: java android file

通常,具有.jpg扩展名的网址可能会成为.gif或.mp4类型的文件,反之亦然。有没有办法在不下载整个文件的情况下确切地确定URL包含哪种类型的文件?

示例:http://i.imgur.com/9b4bIW9.jpg

这有.jpg扩展,但实际上是.gif。

1 个答案:

答案 0 :(得分:1)

注意:我的解决方案需要:

compile 'com.google.guava:guava:19.0'

因为它提供了ByteStreams.toByteArray函数来从输入流中获取字节数组。当然,您可以使用其他方法来读取输入流。

注意:StrictMode.ThreadPolicy内容是必需的,否则您将获得例外。

基本上,我们创建一个HTTP连接,但只请求远程url文件的第一个单字节。所以我们不需要下载整个文件。 然后通过bytestohex函数传递bytes数组,将其作为原始字节。最后将第一个字节的签名与您从此URL获得的要求进行比较:

对于其他文件类型和文件字节的签名,您可以参考: http://www.garykessler.net/library/file_sigs.html

<强>代码:

protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState); 
    StrictMode.ThreadPolicy policy = new StrictMode.ThreadPolicy.Builder().permitAll().build();
    StrictMode.setThreadPolicy(policy);
    try {
        detectTypeOfFile();
    } catch (IOException e) {
        System.out.println("URL: CRASH: " + e.getStackTrace());
        e.printStackTrace();
    }
}

final protected static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
    //http://stackoverflow.com/questions/9655181/how-to-convert-a-byte-array-to-a-hex-string-in-java
    char[] hexChars = new char[bytes.length * 2];
    for ( int j = 0; j < bytes.length; j++ ) {
        int v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}

public void detectTypeOfFile() throws IOException {

    String[] urls = {"http://i.imgur.com/9b4bIW9.jpg","http://i.imgur.com/f00y2uz.jpg","http://i.imgur.com/9b4bIW9.mp4","http://i.imgur.com/9b4bIW9.gif"};

    for (int i=0;i<urls.length;i++){
        URL url = new URL(urls[i]);
        HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
        connection.setRequestProperty("Range", "bytes="+0+"-"+0);
        connection.connect();
        byte[] bytes = ByteStreams.toByteArray(connection.getInputStream());
        System.out.println("URL: " + url.toString() + "  is of type: " + bytesToHex(bytes));
        switch (bytesToHex(bytes)) {
            //http://www.garykessler.net/library/file_sigs.html
            case "00":
                System.out.println("URL: " + url.toString() + "  is of type: mp4");
                break;
            case "FF":
                System.out.println("URL: " + url.toString() + "  is of type: image/jpeg");
                break;
            case "89":
                System.out.println("URL: " + url.toString() + "  is of type: image/png");
                break;
            case "47":
                System.out.println("URL: " + url.toString() + "  is of type: image/gif");
                break;
            case "49":
            case "4D":
                System.out.println("URL: " + url.toString() + "  is of type: image/tiff");
                break;
        }
        connection.disconnect();
    }
}

以上输出:

06-05 01:51:47.022 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.jpg  has first byte: 47
06-05 01:51:47.022 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.jpg  is of type: image/gif
06-05 01:51:47.056 12554-12554/? I/System.out: URL: http://i.imgur.com/f00y2uz.jpg  has first byte: FF
06-05 01:51:47.056 12554-12554/? I/System.out: URL: http://i.imgur.com/f00y2uz.jpg  is of type: image/jpeg
06-05 01:51:47.091 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.mp4  has first byte: 00
06-05 01:51:47.091 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.mp4  is of type: mp4
06-05 01:51:47.124 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.gif  has first byte: 47
06-05 01:51:47.124 12554-12554/? I/System.out: URL: http://i.imgur.com/9b4bIW9.gif  is of type: image/gif