Spark rdd.first抛出KryoException - IndexOutOfBoundsException

时间:2017-04-06 15:52:52

标签: hadoop apache-spark serialization hbase kryo

我试图按如下方式阅读hadoop文件:

sparkConf = new SparkConf()
        .setMaster("local[*]")
        .setAppName("test")
        .set("spark.ui.enabled", "false")
        .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
        .registerKryoClasses(new Class<?>[]{
                scala.Tuple2.class,
                org.apache.hadoop.hbase.client.Put.class,
                org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
                org.apache.hadoop.hbase.client.Mutation.class,
                java.util.Map.class,
                java.util.NavigableMap.class,
                java.util.List.class,
                java.util.TreeMap.class,
        })
        .set("spark.app.id", appID());
SparkContext sc = new SparkContext(sparkConf);
JavaPairRDD<ImmutableBytesWritable, Put> putRdd = sharedContext.jsc().newAPIHadoopRDD(hadoopConf, SequenceFileInputFormat.class, ImmutableBytesWritable.class, Put.class);
Tuple2<ImmutableBytesWritable, Put> tuple1 = putRdd.first();

即使我明确注册了kryo类,我也会得到以下异常:

2017-04-06 17:31:35,287 ERROR [task-result-getter-3] scheduler.TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 48, Size: 8
Serialization trace:
familyMap (org.apache.hadoop.hbase.client.Put)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
    at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
    at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
    at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:275)
    at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException: Index: 48, Size: 8
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
    at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:773)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
... 18 more

任何想法如何正确注册类并避免此序列化问题?

0 个答案:

没有答案