已经看过Spark streaming not remembering previous state 但没有帮助。 还看了http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing但是找不到JavaStreamingContextFactory,虽然我使用的是spark streaming 2.11 v 2.0.1
我的代码工作正常,但是当我重新启动它时......它将不记得最后一个检查点......
Function0<JavaStreamingContext> scFunction = new Function0<JavaStreamingContext>() {
@Override
public JavaStreamingContext call() throws Exception {
//Spark Streaming needs to checkpoint enough information to a fault- tolerant storage system such
JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.milliseconds(SPARK_DURATION));
//checkpointDir = "hdfs://user:pw@192.168.1.50:54310/spark/checkpoint";
ssc.sparkContext().setCheckpointDir(checkpointDir);
StorageLevel.MEMORY_AND_DISK();
return ssc;
}
};
JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkpointDir, scFunction);
目前数据来自kafka,我正在进行一些转型和行动。
JavaPairDStream<Integer, Long> responseCodeCountDStream = logObject.transformToPair
(MainApplication::responseCodeCount);
JavaPairDStream<Integer, Long> cumulativeResponseCodeCountDStream = responseCodeCountDStream.updateStateByKey
(COMPUTE_RUNNING_SUM);
cumulativeResponseCodeCountDStream.foreachRDD(rdd -> {
rdd.checkpoint();
LOG.warn("Response code counts: " + rdd.take(100));
});
如果我错过了什么,有人会指出我正确的方向吗?
另外,我可以看到检查点正在hdfs中保存。但为什么不读它呢?