在Java中有没有办法检测文件是ANSI还是UTF-8?我遇到的问题是,如果有人在Excel中创建了一个CSV文件,那就是UTF-8。如果他们使用记事本创建它,那就是ANSI。
我想知道我是否可以检测到文件的类型,然后相应地处理它。
感谢。
答案 0 :(得分:1)
你可以尝试这样的事情。它依赖于Excel,包括字节顺序标记(BOM),快速搜索建议它尽管我无法验证它,以及java将BOM视为特定的字符" \uFEFF
。
FileInputStream fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
String line = br.readLine();
if (line.startsWith("\uFEFF")) {
// it's UTF-8, throw away the BOM character and continue
line = line.substring(1);
} else {
// it's not UTF-8, reopen
br.close(); // also closes fis
fis = new FileInputStream(file); // reopen from the start
br = new BufferedReader(new InputStreamReader(fis, "Cp1252"));
line = br.readLine();
}
// now line contains the first line, and br.readLine() will get the next
有关UTF-8字节顺序标记的更多信息以及http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8处的编码检测