我想在使用示例avro_inputformat.py
中的代码时分享我的问题schema = open('test_schema_without_map.avsc').read()
conf = {"avro.schema.input.key": reduce(lambda x, y: x + y, schema)}
avro_image_rdd = sc.newAPIHadoopFile(
input_file,
"org.apache.avro.mapreduce.AvroKeyInputFormat",
"org.apache.avro.mapred.AvroKey",
"org.apache.hadoop.io.NullWritable",
keyConverter="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter",
conf=conf
)
output = avro_image_rdd.map(lambda x: x[0]).collect()
for k in output:
print "Image filename : %s" % k
并且在运行时
spark-submit --driver-class-path /opt/spark-1.6.1/lib/spark-examples-1.6.1-hadoop2.6.0.jar read_test_avro_file_with_map.py
我收到以下错误
Job aborted due to stage failure: Exception while getting task result: java.io.InvalidClassException: scala.collection.convert.Wrappers$MutableMapWrapper; no valid constructor
使用以下架构读取avro文件时:
{
"namespace": "test.avro",
"type": "record",
"name": "TestImage",
"fields": [
{"name": "filename", "type": "string"},
{"name": "data", "type": "bytes"},
{"name": "metadata", "type":
{
"type": "map", "values": "string"
}
}
],
}
但是,当架构不包含'map'avro复杂类型时,相同的代码可以正常工作:
{
"namespace": "test.avro",
"type": "record",
"name": "TestImage",
"fields": [
{"name": "filename", "type": "string"},
{"name": "data", "type": "bytes"},
],
}
如果有人知道问题出在哪里,请分享您的经验......
版本:
avro文件的内容是:
records = [
{
"filename": "input_filename_1",
"metadata": {"a": "1", "b": "23"},
"data": "1,2,3,4,5,6,7,8,9,0"
},
{
"filename": "input_filename_2",
"metadata": {"c": "11", "d": "213"},
"data": "10,11,12,13,14,15"
}
]