读取文件时请求的阵列大小超过VM限制

时间:2019-07-08 16:49:05

标签: java hadoop hdfs

我正在尝试使用List中的特定键从序列文件中读取字节。当我调用方法getCurrentValue时,我遇到了OutOfMemoryError:请求的数组大小超出了VM限制。我需要读取此密钥的所有字节是什么。

我试图将BytesWritable buf的容量更改为1GB,但这无济于事。

public List<byte[]> read(String path, String key) {
        Path hdfsPath = new Path(path);
        List<byte[]> fileBytes = new ArrayList<>(1);

        Text keyBuf = new Text();

        try (SequenceFile.Reader in = new SequenceFile.Reader(connectionConfig, SequenceFile.Reader.file(hdfsPath))) {
            while (in.next(keyBuf)) {
                if (keyBuf.toString().equals(key)) {
                    BytesWritable buf = new BytesWritable();
                    in.getCurrentValue(buf); //here throw exception
                    fileBytes.add(buf.getBytes());
                }
            }
        } catch (IOException e) {
            throw new HdfsException("When read file, path=" + path + " key=" + keyBuf, e);
        }

        return fileBytes;
    }

例外:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:146)
at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:125)
at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308)
at ...

0 个答案:

没有答案