Parquet ClassCastException:parquet.io.MessageColumnIO无法强制转换为parquet.io.PrimitiveColumnIO

时间:2017-07-18 08:53:59

标签: scala avro parquet

我尝试编写一个简单的Scala程序,将数据转储到Parquet文件到HDFS。

我创建了一个Avro架构,使用此架构初始化ParquetWriter,按照定义的架构将我的记录映射到GenericRecords,然后尝试使用镶木地板编写器编写它们。

但不幸的是,我在运行程序时遇到以下异常:

java.lang.ClassCastException: parquet.io.MessageColumnIO cannot be cast to parquet.io.PrimitiveColumnIO
    at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.getColumnWriter(MessageColumnIO.java:339)
    at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:376)
    at parquet.io.ValidatingRecordConsumer.addBinary(ValidatingRecordConsumer.java:211)
    at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:260)
    at parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
    at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)

架构定义:

val avroSchema: Schema = SchemaBuilder.record("event_snapshots").fields()
  .requiredString("userid")
  .requiredString("event")
  .requiredString("firstevent")
  .requiredString("lastevent")
  .requiredInt("count")
  .endRecord()

val parquetSchema = new AvroSchemaConverter().convert(avroSchema)

编剧:

val writeSupport = new AvroWriteSupport[GenericRecord](parquetSchema, avroSchema, null)

val blockSize = 256 * 1024 * 1024
val pageSize = 64 * 1024

val writer = new ParquetWriter[GenericRecord](outputDir, writeSupport,
  CompressionCodecName.SNAPPY, blockSize,
  pageSize, pageSize, false, true, configuration)

记录构建并写入:

val recordBuilder = new GenericRecordBuilder(avroSchema)

recordBuilder.set(avroSchema.getField("userid"), userKey)
recordBuilder.set(avroSchema.getField("event"), eventKey)
recordBuilder.set(avroSchema.getField("firstevent"), 
  dateTimeDateFormat.format(firstEvent))
recordBuilder.set(avroSchema.getField("lastevent"),
  dateTimeDateFormat.format(lastEvent))
recordBuilder.set(avroSchema.getField("count"), event.count)

val record = recordBuilder.build()
writer.write(record)

有什么想法吗?

0 个答案:

没有答案