从Java中的Parquet文件中读取十进制字段

时间:2017-06-07 16:16:05

标签: java parquet

我按如下方式阅读了镶木地板文件,

Builder<GenericRecord> builder = AvroParquetReader.builder(path);
ParquetReader<GenericRecord> reader = builder.build();

GenericRecord record = null;
while((record = reader.read()) != null) {
  System.out.println(record.toString());
}

输出:

{"var1": "ABCD", "var2": "1234567", "var3": [0, 0, 0, 0, 0, 0, 0, 0, 113, 15, 120, -111, -92, -114, -112, 50]}

我尝试对字节数组值

进行任何类型转换
(byte[]) record.get("var3")

引发

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Fixed cannot be cast to [B

如何将此GenericData转换回Decimal?

Parquet文件架构:

-bash-4.1$ parquet-tools schema my-parquet-file.gz.parquet
message spark_schema {
optional binary var1 (UTF8);
optional int64 var2;
optional fixed_len_byte_array(16) var3 (DECIMAL(38,8));
}

2 个答案:

答案 0 :(得分:1)

能够使用较新版本的avro(Documentation

执行此操作
public BigDecimal fromFixed(GenericFixed value,
               Schema schema,
               LogicalType type)

正是我所需要的。

答案 1 :(得分:0)

使用Avro GenericData API:

Binary binary = simpleGroup.getBinary(i, 0);
Conversions.DecimalConversion decimalConversions = new Conversions.DecimalConversion();
BigDecimal bigDecimal = decimalConversions.fromFixed(
   new GenericData.Fixed(Schema.create(Schema.Type.DOUBLE), binary.getBytes()), 
   Schema.create(Schema.Type.DOUBLE), 
   LogicalTypes.decimal(38, 10));

我更喜欢“火花方式”:

Binary binary = simpleGroup.getBinary(i, 0);
new BigDecimal(new BigInteger(binary.getBytes()), scale);