我如何使用" ssc.sparkContext()"在火花流的foreachRDD?
如果我在foreachRDD(JAVA)中使用"ssc.sparkContext()"
(基本上类似于ssc.sparkContext().broadcast(map)),
,那么我得到"任务不可序列化"错误。
如果我使用"(new JavaSparkContext(rdd.context())).broadcast(map)"
,则没有问题。
因此,"ssc.sparkContext()"
基本上等同于"(new JavaSparkContext(rdd.context()))"?
如果我使用"(new JavaSparkContext(rdd.context())).broadcast(map)"
,广播变量即关联" map"在SparkContext中分发给所有执行者。
代码如下: 这里," bcv.broadcastVar =(new JavaSparkContext(rdd.context()))。broadcast(map);"但是" bcv.broadcastVar = ssc.sparkContext.broadcast(map);"不起作用
words.foreachRDD(new Function<JavaRDD<String>, Void>() {
@Override
public Void call(JavaRDD<String> rdd) throws Exception {
if (rdd != null) {
System.out.println("Hello World - words - SSC !!!"); // Gets printed on Driver
if (stat.data_changed == 1) {
stat.data_changed = 0;
bcv.broadcastVar.unpersist(); // Unpersist BC variable
bcv.broadcastVar = (new JavaSparkContext(rdd.context())).broadcast(map); // Re-broadcast same BC variable with NEW data
}
}
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
@Override
public void call(Iterator<String> items) throws Exception {
System.out.println("words.foreachRDD.foreachPartition: CALLED ..."); // Gets called on Worker/Executor
Integer index = 1;
String lastKey = "";
Integer lastValue = 0;
while (true) {
String key = "A" + Long.toString(index);
Integer value = bcv.broadcastVar.value().get(key); // Executor Consumes map
if (value == null) break;
lastKey = key;
lastValue = value;
index++;
}
System.out.println("Executor BC: key/value: " + lastKey + " = " + lastValue);
return;
}
});
return null;
}
});