Question

我有一个文件要读取保存，用它的信息做一些事情，然后将它们重写回另一个文件。问题是原始文件包含来自亚洲语言的一些字符，如坂本龍一，東京事変和メリー（我猜它们是中文，日文和韩文）。我可以使用Notepad ++看到它们。

问题是，当我阅读它们并通过java编写这些内容时，它们会被破坏，我在输出文件中看到奇怪的内容，如????????或Ð–Ð°Ð½Ð½Ð° Ð‘Ð¸Ñ‡ÐµÐ²Ñ?ÐºÐ°Ñ? 我认为编码有问题，但我不知道使用哪种以及如何使用它。

有人能帮帮我吗？这是我的代码：

    String fileToRead= SONG_2M;
            Scanner scanner = new Scanner(new File(fileToRead), "UTF-8");

            while (scanner.hasNextLine()) {

                String line = scanner.nextLine();
                String[] songData = line.split("\t");
                if (/*something*/) {
                    save the string in the map
                }
            }
            scanner.close();

            saveFile("coded_artist_small2.txt");
}

    public void saveFile(String fileToSave) throws FileNotFoundException, UnsupportedEncodingException {
            PrintWriter writer = new PrintWriter(fileToSave, "UTF-8");

            for (Entry<String, Integer> entry : artistsMap.entrySet()) {
                writer.println(entry.getKey() + DELIMITER + entry.getValue());
            }

            writer.close();
        }

Answer 1

实际上，您的输入文件可能不是以UTF-8编码的（每个字符使用两个字节来满足unicode标准的编码）。例如，您看到的字符坂是unicode 0x5742。实际上，如果您的文件是用ASCII编码的，则应显示为字符0x57，后跟0x42 - 即 9 * 。

如果您不确定文件的编码 - 请猜测它可能是ASCII文本。设置扫描仪时尝试删除编码，即编写代码的第二行

Scanner scanner = new Scanner(new File(fileToRead));

实际上，如果您知道该文件是unicode，则有不同的编码。有关更全面的unicode阅读器，请参阅this answer - 处理各种unicode编码。

对于您的输出 - 您需要决定文件编码的方式：一些unicode编码（例如UTF-8）或ASCII。

文件读取编码麻烦

1 个答案: