我在阅读带有BINARY通道的镶木地板文件时,由于ArrayOutOfBound异常而导致Parquet解码异常。如果它不包含BINARY字段,我能够读取镶木地板文件。 涉及数组超出绑定异常的代码段是:
public Binary readBytes() {
try {
int length = BytesUtils.readIntLittleEndian(in, offset);
int start = offset + 4;
offset = start + length;
return Binary.fromConstantByteArray(in, start, length);
} catch (IOException e) {
throw new ParquetDecodingException("could not read bytes at offset " + offset, e);
} catch (RuntimeException e) {
throw new ParquetDecodingException("could not read bytes at offset " + offset, e);
}
}
函数readIntLittleEndian(in,offset)如下:
public static int readIntLittleEndian(byte[] in, int offset) throws IOException {
int ch4 = in[offset] & 0xff;
int ch3 = in[offset + 1] & 0xff;
int ch2 = in[offset + 2] & 0xff;
int ch1 = in[offset + 3] & 0xff;
return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
}
这些方法是apache的parquet-hadoop jar文件的一部分。
以下是详细的异常消息:
Exception in thread "main" org.apache.parquet.io.ParquetDecodingException: Can not read value at 7032 in block 0 in file file:/C:/Users/EKE9FE/Documents/MATLAB/part-0
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:243)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
at parquet.compat.test.ConvertUtils$1.next(ConvertUtils.java:439)
at parquet.compat.test.ConvertUtils$1.next(ConvertUtils.java:1)
at parquet.compat.test.ConvertUtils.getFullParquetFile(ConvertUtils.java:63)
at parquet.compat.test.Test.main(Test.java:35)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [B_avfra] BINARY at value 7032 out of 7724, 7032 out of 7724 in currentPage. repetition level: 0, definition level: 0
at org.apache.parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:483)
at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:370)
at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:218)
... 5 more
Caused by: org.apache.parquet.io.ParquetDecodingException: could not read bytes at offset 28124
at org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:46)
at org.apache.parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:312)
at org.apache.parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:464)
... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 28124
at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:57)
at org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:39)
... 10 more