读取具有BINARY通道的Parquet文件时出现异常

时间:2016-09-20 07:40:49

标签: arrays parquet

我在阅读带有BINARY通道的镶木地板文件时,由于ArrayOutOfBound异常而导致Parquet解码异常。如果它不包含BINARY字段,我能够读取镶木地板文件。  涉及数组超出绑定异常的代码段是:

public Binary readBytes() {

    try {

      int length = BytesUtils.readIntLittleEndian(in, offset);
      int start = offset + 4;
      offset = start + length;
      return Binary.fromConstantByteArray(in, start, length);

    } catch (IOException e) {
      throw new ParquetDecodingException("could not read bytes at offset " + offset, e);
    } catch (RuntimeException e) {
      throw new ParquetDecodingException("could not read bytes at offset " + offset, e);
    }
  }

函数readIntLittleEndian(in,offset)如下:

public static int readIntLittleEndian(byte[] in, int offset) throws IOException {

    int ch4 = in[offset] & 0xff;
    int ch3 = in[offset + 1] & 0xff;
    int ch2 = in[offset + 2] & 0xff;
    int ch1 = in[offset + 3] & 0xff;
    return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
  }

这些方法是apache的parquet-hadoop jar文件的一部分。

以下是详细的异常消息:

Exception in thread "main" org.apache.parquet.io.ParquetDecodingException: Can not read value at 7032 in block 0 in file file:/C:/Users/EKE9FE/Documents/MATLAB/part-0
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
    at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
    at parquet.compat.test.ConvertUtils$1.next(ConvertUtils.java:439)
    at parquet.compat.test.ConvertUtils$1.next(ConvertUtils.java:1)
    at parquet.compat.test.ConvertUtils.getFullParquetFile(ConvertUtils.java:63)
    at parquet.compat.test.Test.main(Test.java:35)

Caused by: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [B_avfra] BINARY at value 7032 out of 7724, 7032 out of 7724 in currentPage. repetition level: 0, definition level: 0
    at org.apache.parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:483)
    at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:370)
    at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:218)
    ... 5 more
Caused by: org.apache.parquet.io.ParquetDecodingException: could not read bytes at offset 28124
    at org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:46)
    at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:312)
    at org.apache.parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:464)
    ... 8 more

Caused by: java.lang.ArrayIndexOutOfBoundsException: 28124
    at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:57)
    at org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:39)
    ... 10 more

0 个答案:

没有答案