Question

我尝试使用Java将数据库迁移并解析为新架构。问题是有一些字符，特别是阿拉伯字符，在用Java处理数据时会搞砸。

以下是countryToParse.sql文件中遇到问题的其中一行：

(4, 'Afganistán', 1, 'Afgano', 'Afghanistan', 'AF', 'أفغانستان', 'Afghan', 'أفغاني');

解析后，countryParsed.sql中的结果行被视为：

(4, 'Afganistán', 1, 'Afgano', 'Afghanistan', 'AF', 'أ�?غانستان', 'Afghan', 'أ�?غاني');

你会看到某些阿拉伯字符是如何搞砸的如果我打开文件，我可以检查它们都是用UTF-8编码的。

这是我正在使用的Java代码。在方法writeToTextFile()中，我添加了三种使用UTF-8编写文件的方法（更不用说我用三种方式获取相同的错误）

public class MainStackOverflow {

public static void main(String[] args) throws IOException {

    String countryStr = new         String(readTextFile("src/data/countryToParse.sql").getBytes(), "UTF-8");
    writeToTextFile("src/data/countryParsed.sql", countryStr);
}

    public static String readTextFile(String fileName) throws IOException {
    String content = new String(Files.readAllBytes(Paths.get(fileName)));
        return content;
    }

    public static void writeToTextFile(String fileName, String content) throws IOException {

         /* Way 1 */
         Files.write(Paths.get(fileName), content.getBytes("UTF-8"), StandardOpenOption.CREATE);


        /* Way 2 */
         BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
                new FileOutputStream(fileName), "UTF-8"));
            try {
                out.write(content);
            } finally {
                out.close();
            }

        /* Way 3 */
        PrintWriter out1 = new PrintWriter(new File(fileName), "UTF-8");
        out1.write(content);
        out1.flush();
        out1.close();
    /* */
    }
}

Answer 1

您忘记在此行中设置编码：

String content = new String(Files.readAllBytes(Paths.get(fileName)));

试试这个：

public static void main(String[] args) throws IOException {

    String countryStr = new String(readTextFile("src/data/countryToParse.sql"), "UTF-8");
    writeToTextFile("src/data/countryParsed.sql", countryStr);
}

public static byte[] readTextFile(String fileName) throws IOException {
    return Files.readAllBytes(Paths.get(fileName));
}

使用java解析阿拉伯字符的问题

1 个答案: