Spark kryo_serializers和Broadcast <map <object,iterable <gowalladatalocation =“”>&gt;&gt; java.io.IOException:java.lang.UnsupportedOperationException

时间:2017-03-26 01:29:54

标签: java apache-spark kryo

当我尝试访问Broadcast变量时,我收到此异常:

  

17/03/26 03:04:23 WARN TaskSetManager:阶段3.0中丢失的任务0.0(TID 10,192.168.56.5,执行器1):java.io.IOException:java.lang.UnsupportedOperationException       at org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1276)       在org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)       在org.apache.spark.broadcast.TorrentBroadcast._value $ lzycompute(TorrentBroadcast.scala:66)       在org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)       在org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)       在GowallaTask $ 2.call(GowallaTask.java:214)       在org.apache.spark.api.java.JavaRDDLike $$ anonfun $ foreach $ 1.apply(JavaRDDLike.scala:351)       在org.apache.spark.api.java.JavaRDDLike $$ anonfun $ foreach $ 1.apply(JavaRDDLike.scala:351)       在scala.collection.Iterator $ class.foreach(Iterator.scala:893)       在org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)       在org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala:917)       在org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala:917)       在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1944)       在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1944)       在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)       在org.apache.spark.scheduler.Task.run(Task.scala:99)       在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:282)       在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)       at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)       在java.lang.Thread.run(Thread.java:745)

     

引起:java.lang.UnsupportedOperationException       在java.util.AbstractMap.put(AbstractMap.java:209)       在com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:162)       at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)       在com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)       在org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244)       在org.apache.spark.broadcast.TorrentBroadcast $$ anonfun $ 10.apply(TorrentBroadcast.scala:286)       在org.apache.spark.util.Utils $ .tryWithSafeFinally(Utils.scala:1303)       在org.apache.spark.broadcast.TorrentBroadcast $ .unBlockifyObject(TorrentBroadcast.scala:287)       在org.apache.spark.broadcast.TorrentBroadcast $$ anonfun $ readBroadcastBlock $ 1.apply(TorrentBroadcast.scala:221)       at org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1269)       ......还有19个

我使用KryoSerializer时收到了异常

    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    conf.set("spark.kryoserializer.buffer.mb", "24");

这是我的代码。

JavaPairRDD<Object, Iterable<GowallaDataLocation>> line_RDD_2 = sc
            .textFile("/home/piero/gowalla_location.txt", 2).map(new GowallaMapperDataLocation())
            .groupBy(new Function<GowallaDataLocation, Object>() {

                /**
                 * 
                 */
                private static final long serialVersionUID = -6773509902594100325L;

                @Override
                public Object call(GowallaDataLocation v1) throws Exception {
                    DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd");

                    return dateFormat.format(v1.getDATE());
                }
            }).persist(StorageLevel.MEMORY_AND_DISK_SER());



Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(line_RDD_2.collectAsMap());
    //System.out.println(broadcastVar_2.getValue().size());

    JavaRDD<Object> keys = line_RDD_2.keys().persist(StorageLevel.MEMORY_ONLY_SER());
    line_RDD_2.unpersist();

    keys.foreach(new VoidFunction<Object>() {

        /**
         * 
         */
        private static final long serialVersionUID = -8148877518271969523L;

        @Override
        public void call(Object t) throws Exception {
            // TODO Auto-generated method stub
            //System.out.println("KEY:" + t + " ");
            Iterable<GowallaDataLocation> dr = broadcastVar_2.getValue().get(t);

        }

    });

1 个答案:

答案 0 :(得分:3)

我怀疑这是因为你直接播放line_RDD_2.collectAsMap():这意味着广播的类型是Map,kryo不知道正确的实现,并且会使用AbstractMap进行内部工作

就像我这样做:

Map<String, String> a = new HashMap<String, String>();
a.put("a", "b");
Set<String> c = a.keySet();
c.add("e");

我将获得AbstractCollection不受支持的操作,轻松解决:

Map<String, String> a = new HashMap<String, String>();
a.put("a", "b");
Set<String> c = new TreeSet<String>();
c.addAll(a.keySet());
c.add("e");

如果我猜对了,你可以这样解决它:

Map<Object, Iterable<GowallaDataLocation>> a = new HashMap<>();
a.putAll(line_RDD_2.collectAsMap());
Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(a);

让我知道这是否有效