当我尝试访问Broadcast变量时,我收到此异常:
17/03/26 03:04:23 WARN TaskSetManager:阶段3.0中丢失的任务0.0(TID 10,192.168.56.5,执行器1):java.io.IOException:java.lang.UnsupportedOperationException at org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1276) 在org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206) 在org.apache.spark.broadcast.TorrentBroadcast._value $ lzycompute(TorrentBroadcast.scala:66) 在org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) 在org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 在GowallaTask $ 2.call(GowallaTask.java:214) 在org.apache.spark.api.java.JavaRDDLike $$ anonfun $ foreach $ 1.apply(JavaRDDLike.scala:351) 在org.apache.spark.api.java.JavaRDDLike $$ anonfun $ foreach $ 1.apply(JavaRDDLike.scala:351) 在scala.collection.Iterator $ class.foreach(Iterator.scala:893) 在org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) 在org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala:917) 在org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1 $$ anonfun $ apply $ 28.apply(RDD.scala:917) 在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1944) 在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1944) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 在org.apache.spark.scheduler.Task.run(Task.scala:99) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:282) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)
引起:java.lang.UnsupportedOperationException 在java.util.AbstractMap.put(AbstractMap.java:209) 在com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:162) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39) 在com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) 在org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244) 在org.apache.spark.broadcast.TorrentBroadcast $$ anonfun $ 10.apply(TorrentBroadcast.scala:286) 在org.apache.spark.util.Utils $ .tryWithSafeFinally(Utils.scala:1303) 在org.apache.spark.broadcast.TorrentBroadcast $ .unBlockifyObject(TorrentBroadcast.scala:287) 在org.apache.spark.broadcast.TorrentBroadcast $$ anonfun $ readBroadcastBlock $ 1.apply(TorrentBroadcast.scala:221) at org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1269) ......还有19个
我使用KryoSerializer时收到了异常
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
conf.set("spark.kryoserializer.buffer.mb", "24");
这是我的代码。
JavaPairRDD<Object, Iterable<GowallaDataLocation>> line_RDD_2 = sc
.textFile("/home/piero/gowalla_location.txt", 2).map(new GowallaMapperDataLocation())
.groupBy(new Function<GowallaDataLocation, Object>() {
/**
*
*/
private static final long serialVersionUID = -6773509902594100325L;
@Override
public Object call(GowallaDataLocation v1) throws Exception {
DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd");
return dateFormat.format(v1.getDATE());
}
}).persist(StorageLevel.MEMORY_AND_DISK_SER());
Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(line_RDD_2.collectAsMap());
//System.out.println(broadcastVar_2.getValue().size());
JavaRDD<Object> keys = line_RDD_2.keys().persist(StorageLevel.MEMORY_ONLY_SER());
line_RDD_2.unpersist();
keys.foreach(new VoidFunction<Object>() {
/**
*
*/
private static final long serialVersionUID = -8148877518271969523L;
@Override
public void call(Object t) throws Exception {
// TODO Auto-generated method stub
//System.out.println("KEY:" + t + " ");
Iterable<GowallaDataLocation> dr = broadcastVar_2.getValue().get(t);
}
});
答案 0 :(得分:3)
我怀疑这是因为你直接播放line_RDD_2.collectAsMap()
:这意味着广播的类型是Map,kryo不知道正确的实现,并且会使用AbstractMap
进行内部工作
就像我这样做:
Map<String, String> a = new HashMap<String, String>();
a.put("a", "b");
Set<String> c = a.keySet();
c.add("e");
我将获得AbstractCollection
不受支持的操作,轻松解决:
Map<String, String> a = new HashMap<String, String>();
a.put("a", "b");
Set<String> c = new TreeSet<String>();
c.addAll(a.keySet());
c.add("e");
如果我猜对了,你可以这样解决它:
Map<Object, Iterable<GowallaDataLocation>> a = new HashMap<>();
a.putAll(line_RDD_2.collectAsMap());
Broadcast<Map<Object, Iterable<GowallaDataLocation>>> broadcastVar_2 = sc.broadcast(a);
让我知道这是否有效