RandomAccessFile阅读西里尔文UTF-8 java

时间:2013-10-27 23:27:50

标签: java utf-8 randomaccessfile cyrillic

配合!

我无法使用RandomAccessFile读取文件西里尔文本。

这是一个简单的程序,使用这种格式将信息写入特定文件(西里尔文字):

keyLength, valueLength, key, value

然后程序尝试读取此信息,但输出不正确:

writing success
keyLength = 10, valueLength = 4
read: килло, гр

UPD 预期产出:

writing success
keyLength = 10, valueLength = 4
read: киллограмм, сала

有什么问题? (除了小脑的问题)

import java.io.FileNotFoundException;
import java.io.RandomAccessFile;
import java.io.IOException;

public class Main {

    public static void main(String[] args) {
        String fileName = "file.db";
        RandomAccessFile outputFile = null;

        try {
            outputFile = new RandomAccessFile(fileName, "rw");
        } catch (FileNotFoundException e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }

        String key = "киллограмм";
        String value = "сала";

        try {
            outputFile.writeInt(key.length());
            outputFile.writeInt(value.length());

            outputFile.write(key.getBytes("UTF-8"));
            outputFile.write(value.getBytes("UTF-8"));
        } catch (IOException e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }

        System.out.println("writing success");

        RandomAccessFile inputFile = null;

        try {
            inputFile = new RandomAccessFile(fileName, "r");
        } catch (FileNotFoundException e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }

        int keyLength = 0, valueLength = 0;

        try {
            keyLength = inputFile.readInt();
            valueLength = inputFile.readInt();
        } catch (IOException e) {
            System.err.println(e.getMessage());
        }

        System.out.println("keyLength = " + keyLength + ", valueLength = " + valueLength);
        if (keyLength <= 0 || valueLength <= 0) {
            System.err.println("key or value length is negative");
            System.exit(1);
        }

        byte[] keyBytes = null, valueBytes = null;

        try {
            keyBytes = new byte[keyLength];
            valueBytes = new byte[valueLength];
        } catch (OutOfMemoryError e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }

        try {
            inputFile.read(keyBytes);
            inputFile.read(valueBytes);
        } catch (IOException e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }

        try {
            System.out.println("read: " + new String(keyBytes, "UTF-8") + ", " + new String(valueBytes, "UTF-8"));
        } catch (IOException e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }

    }
}

1 个答案:

答案 0 :(得分:2)

问题是这个

outputFile.writeInt(key.length());

String#length()

  

返回此字符串的长度。长度等于数字   字符串中的Unicode代码单元。

在这种情况下,它返回值10,这不是表示此String所需的字节数。

你想要的是

key.getBytes("UTF-8").length

用作

byte[] keyBytes = key.getBytes("UTF-8");
outputFile.writeInt(keyBytes.length);

value相同。