java检测文件是否为UTF-8或Ansi

时间:2015-01-13 19:48:48

标签: java utf-8 ansi

在Java中有没有办法检测文件是ANSI还是UTF-8?我遇到的问题是,如果有人在Excel中创建了一个CSV文件,那就是UTF-8。如果他们使用记事本创建它,那就是ANSI。

我想知道我是否可以检测到文件的类型,然后相应地处理它。

感谢。

1 个答案:

答案 0 :(得分:1)

你可以尝试这样的事情。它依赖于Excel,包括字节顺序标记(BOM),快速搜索建议它尽管我无法验证它,以及java将BOM视为特定的字符" \uFEFF

FileInputStream fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));

String line = br.readLine();
if (line.startsWith("\uFEFF")) {
    // it's UTF-8, throw away the BOM character and continue
    line = line.substring(1);
} else {
    // it's not UTF-8, reopen
    br.close(); // also closes fis
    fis = new FileInputStream(file); // reopen from the start
    br = new BufferedReader(new InputStreamReader(fis, "Cp1252"));
    line = br.readLine();
}

// now line contains the first line, and br.readLine() will get the next

有关UTF-8字节顺序标记的更多信息以及http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8处的编码检测