我有一个文件:
$ file utf8.sql
utf8.sql: UTF-8 Unicode text
看起来像这样:
$ hexdump -C utf8.sql
00000000 61 73 64 66 61 73 64 66 f0 9d 93 ad 0a |asdfasdf.....|
0000000d
当我使用org.mozilla.universalchardet.UniversalDetector
它没有注意到该文件包含utf8字符?这里发生了什么?
P.S。这是检查编码的方法:
protected String getEncoding(File file, Log log) {
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
UniversalDetector detector = new UniversalDetector(null);
byte[] buf = new byte[4096];
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
detector.dataEnd();
String encoding = detector.getDetectedCharset();
detector.reset();
return encoding;
} catch (Exception e) {
log.warn("Unable to detect encoding for file: " + file + " due to: " + e);
} finally {
IOUtil.close(fis);
}
return null;
}