我正在尝试从HDFS读取Avro文件。我已经检查它们是否存在于数据节点上,并且可以使用hdfs dfs -cat命令读取它们。
但是,当我尝试在Scala中读取数据时,出现此异常:
Exception in thread "main" java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
at spark_test.TestSparkJob$.main(TestSparkJob.scala:55)
at spark_test.TestSparkJob.main(TestSparkJob.scala)
Caused by: java.io.EOFException
at org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:827)
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:349)
at org.apache.avro.io.BinaryDecoder.readFixed(BinaryDecoder.java:302)
at org.apache.avro.io.Decoder.readFixed(Decoder.java:150)
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:100)
... 3 more
可能是什么原因?
这是我用于读取Avro文件的代码:
val fsInputStream = fs.open(new Path("/data/avro_static.avro"))
val datumReader = new GenericDatumReader[GenericRecord]()
val inStream = new BufferedInputStream(fsInputStream)
val fileReader = new DataFileStream(inStream, datumReader)
println("Schema " + fileReader.getSchema.toString())
hdfs -dfs -cat
命令的结果:
Objavro.schema�{"type":"record","name":"TestData","namespace":"sample","fields":[{"name":"random_pk","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":0}]},{"name":"random_string","type":["string","null"]},{"name":"code","type":["string","null"]},{"name":"random_bool","type":["boolean","null"]},{"name":"random_int","type":["int","null"]},{"name":"random_float","type":["double","null"]},{"name":"random_double","type":["double","null"]},{"name":"random_enum","type":["null",{"type":"enum","name":"enumType","symbols":["VAL_1","VAL_2","VAL_3"]}]},{"name":"random_date","type":["null",{"type":"int","logicalType":"date"}]},{"name":"random_decimal","type":["null",{"type":"bytes","logicalType":"decimal","precision":4,"scale":2}]},{"name":"update_database_time","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"update_database_time_tz","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"random_money","type":["null",{"type":"bytes","logicalType":"decimal","precision":19,"scale":4}]}]}avro.codec
g�9���E>����this word7,5,1,4,6@`f�D@= snappy���
g�9���E># ����Z���ײZ���that word2,5,4,8���؆@��Q���@���Л�Z��翲ZV��������