Question

当我尝试阅读thesaurus.txt时，它将其读作“ÿþ”，尽管第一个条目是“<pat>a cappella”。可能导致这种情况的原因是什么？

    File file = new File("thesaurus.txt");
    Scanner scan;
    try {
        scan = new Scanner(file);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        scan = null;
    }
    String entry;
    ArrayList<String> thes = new ArrayList<String>();
    while(scan.hasNext())
    {
        entry = scan.nextLine();
        if(entry != "")
        {
             thes.add(entry);
        }
    }
    return thes;

Answer 1

Yout输入文件可能是以byte order mark开头的UTF-16（LE）文件。

如果您将此文件视为ISO 8859-1，您会看到这两个字符：ÿþ其中包含代码FF和FE的字符编码，这正是当存在UTF-16 BOM时，你会发现。

您应该在阅读文件时明确指定字符编码，而不是依赖于系统的默认字符编码：

scan = new Scanner(file, "UTF-16");

Java IO无法读取文本文件

1 个答案: