我按如下方式阅读了镶木地板文件,
Builder<GenericRecord> builder = AvroParquetReader.builder(path);
ParquetReader<GenericRecord> reader = builder.build();
GenericRecord record = null;
while((record = reader.read()) != null) {
System.out.println(record.toString());
}
输出:
{"var1": "ABCD", "var2": "1234567", "var3": [0, 0, 0, 0, 0, 0, 0, 0, 113, 15, 120, -111, -92, -114, -112, 50]}
我尝试对字节数组值
进行任何类型转换(byte[]) record.get("var3")
引发
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Fixed cannot be cast to [B
如何将此GenericData转换回Decimal?
Parquet文件架构:
-bash-4.1$ parquet-tools schema my-parquet-file.gz.parquet
message spark_schema {
optional binary var1 (UTF8);
optional int64 var2;
optional fixed_len_byte_array(16) var3 (DECIMAL(38,8));
}
答案 0 :(得分:1)
能够使用较新版本的avro(Documentation)
执行此操作public BigDecimal fromFixed(GenericFixed value,
Schema schema,
LogicalType type)
正是我所需要的。
答案 1 :(得分:0)
使用Avro GenericData API:
Binary binary = simpleGroup.getBinary(i, 0);
Conversions.DecimalConversion decimalConversions = new Conversions.DecimalConversion();
BigDecimal bigDecimal = decimalConversions.fromFixed(
new GenericData.Fixed(Schema.create(Schema.Type.DOUBLE), binary.getBytes()),
Schema.create(Schema.Type.DOUBLE),
LogicalTypes.decimal(38, 10));
我更喜欢“火花方式”:
Binary binary = simpleGroup.getBinary(i, 0);
new BigDecimal(new BigInteger(binary.getBytes()), scale);