读取UTF8文件(在记事本上创建)并转换为CP850字符串

时间:2014-12-22 19:14:46

标签: java decode encode utf codepages

我试图读取UTF8文件并将其转换为CP850(发送到打印机设备)。 我的测试字符串是“ATIVAÇÃO”

A    T    I    V    A    Ç         Ã       O
0x41 0x54 0x49 0x56 0x41 0xC3 0x87 C3 0x83 4F

我的java代码:

private static void printBytes(String s, String st) {
    byte[] b_str = s.getBytes();
    System .out.print(String.format("%-7s >>> ", st));
    for (int i=0; i<s.length();i++)
        System.out.print(String.format("%-7s ", s.charAt(i)));
    System.out.println();

    System .out.print(String.format("%-7s >>> ", st));
    for (int i=0; i<b_str.length;i++)
        System.out.print(String.format("0x%-5x ", (int)b_str[i] & 0xff));
    System.out.println();
}

public static void main(String [] args) throws Exception, Exception {

    String F="file.txt";

    InputStreamReader input = new InputStreamReader(new FileInputStream(F));
    BufferedReader in = new BufferedReader(input);

    String strFILE;
    String strCP850;

    while ((strFILE = in.readLine()) != null) {

        strFILE = strFILE.substring(3);
        printBytes(strFILE, "ORI");
        strCP850 = new String(strFILE.getBytes(), "CP850");
        printBytes(strCP850, "CP850");
        System.exit(0);
    }

    in.close();

}

输出:

ORI     >>> A       T       I       V       A       Ã       ‡       Ã       ƒ       O       
ORI     >>> 0x41    0x54    0x49    0x56    0x41    0xc3    0x87    0xc3    0x83    0x4f    
CP850   >>> A       T       I       V       A       ?       ç       ?       â       O      
CP850   >>> 0x41    0x54    0x49    0x56    0x41    0x3f    0xe7    0x3f    0xe2    0x4f   

expecting“Ç”为0xc7和“Ô0xc3,但转换结果为两个字节的字符(如utf8 ......)。

我做错了什么?

有没有办法做到这一点(jdk 1.6)?

1 个答案:

答案 0 :(得分:1)

首先:String没有编码。但是,正确执行的操作是在将文件作为文本读取时指定编码。

为了读取UTF-8中的文件然后将其转储为cp850:您可以这样做:

final Path path = Paths.get("file.txt");

try (
    final BufferedReader reader = Files.newBufferedReader(path,
        StandardCharsets.UTF_8);
) {
    String line;
    byte[] bytes;
    while ((line = reader.readLine()) != null) {
        bytes = line.getBytes(Charset.forName("cp850"));
        // write this method
        dumpBytes(bytes);
    }
}