Question

我有一个文件：

$ file utf8.sql
utf8.sql: UTF-8 Unicode text

看起来像这样：

$ hexdump -C utf8.sql
00000000  61 73 64 66 61 73 64 66  f0 9d 93 ad 0a           |asdfasdf.....|
0000000d

当我使用org.mozilla.universalchardet.UniversalDetector

进行检查时

它没有注意到该文件包含utf8字符？这里发生了什么？

P.S。这是检查编码的方法：

protected String getEncoding(File file, Log log) {
    FileInputStream fis = null;
    try {
        fis = new FileInputStream(file);
        UniversalDetector detector = new UniversalDetector(null);
        byte[] buf = new byte[4096];
        int nread;
        while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
            detector.handleData(buf, 0, nread);
        }
        detector.dataEnd();
        String encoding = detector.getDetectedCharset();
        detector.reset();
        return encoding;
    } catch (Exception e) {
        log.warn("Unable to detect encoding for file: " + file + " due to: " + e);
    } finally {
        IOUtil.close(fis);
    }
    return null;
}

org.mozilla.universalchardet.UniversalDetector未检测到编码

0 个答案: