Question

我正在尝试使用Java应用程序将文件从UTF-8转换为UTF-16

但是我的输出结果是这样的藙��Ꟙ돘ꨊ੕䥎潴楦楣阻止楯渮莹瑬攮佲摥牁摤敤乯莹晩捡莹潮偬畧楮㷘께뇛賘꼠慢敬⹏牤敲䅤摥摎潴楦楣阻止楯湐汵杩渽��藘귘뗙裙萠��藘꿛賘뇛賘ꨠ

最终，输出应相同 utf8 =سلامutf16 = \ u0633 \ u0644 \ u0627 \ u0645

import java.io.*;

class WriteUTF8Data<inbytes> {
    WriteUTF8Data() throws UnsupportedEncodingException {
    }

    public static void main(String[] args) throws IOException {
        System.setProperty("file.encoding","UTF-8");

        byte[] inbytes = new byte[1024];

        FileInputStream fis = new FileInputStream("/home/mehrad/Desktop/PerkStoreNotification(1).properties");
        fis.read(inbytes);
        FileOutputStream fos = new FileOutputStream("/home/mehrad/Desktop/PerkStoreNotification(2).properties");
        String in = new String(inbytes, "UTF16");
        fos.write(in.getBytes());
    }
}

Answer 1

您当前正在从UTF-16转换为您的系统默认编码。如果要从UTF-18转换，则在转换二进制数据时需要指定该值。但是，您的代码还有其他问题-您假设InputStream.read读取了整个缓冲区，而这就是文件中的全部内容。您最好使用Reader和Writer，循环并读取一个char数组，然后将该char数组的相关部分写入writer。

这是一些示例代码。这些天可能并不是最好的方法，但它至少应该起作用：

import java.io.*;
import java.nio.charset.*;
import java.nio.file.*;

public class ConvertUtf8ToUtf16 {

    public static void main(String[] args) throws IOException {
        Path inputPath = Paths.get(args[0]);
        Path outputPath = Paths.get(args[1]);

        char[] buffer = new char[4096];
        // UTF-8 is actually the default for Files.newBufferedReader,
        // but let's be explicit.
        try (Reader reader = Files.newBufferedReader(inputPath, StandardCharsets.UTF_8)) {
            try (Writer writer = Files.newBufferedWriter(outputPath, StandardCharsets.UTF_16)) {
                int charsRead;

                while ((charsRead = reader.read(buffer)) != -1) {
                    writer.write(buffer, 0, charsRead);
                }
            }
        }
    }
}

Answer 2

首先，乔恩·斯凯特（Jon Skeet）的答案是正确的答案，并且可以使用。代码的问题在于，您会根据当前的编码方式将传入的String转换为字节（我想是UTF-8），然后尝试从以UTF-8形式产生的字节中使用UTF-16编码创建新的String为什么输出乱码。 Java在内部将Strings保留为其自己的编码（我认为它是UCS-2）。因此，当您拥有一个String时，您可以告诉Java从String中以您想要的任何字符集生成字节。因此，对于相同的有效String方法，getBytes(UTF-8)和getBytes("UTF-16")将产生不同的字节序列。因此，如果您阅读原始内容并且知道它是UTF-8，则需要在UTF-8 String inString = new String(inbytes, "UTF-8")中创建String，然后在编写时从String fos.write(inString.getBytes(UTF-16));中产生字节数组< / p>

我也建议您使用此工具来帮助您了解String的内部工作原理：它是一种实用程序，可以将任何String转换为unicode序列，反之亦然。

< / p>

result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);

此代码的输出是：

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World

包含该实用程序的库称为MgntUtils，可以在Maven Central或Github处找到。它作为Maven工件并带有源代码和javadoc。这是类StringUnicodeEncoderDecoder的javadoc。这是描述MgntUtils开源库的文章的链接：Open Source Java library with stack trace filtering, Silent String parsing Unicode converter and Version comparison

将文件utf8转换为utf16 java

2 个答案: