Spark Checkpoint不记得状态(Java HDFS)

时间:2016-11-10 19:48:31

标签: java apache-spark hdfs spark-streaming hadoop2

已经看过Spark streaming not remembering previous state 但没有帮助。 还看了http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing但是找不到JavaStreamingContextFactory,虽然我使用的是spark streaming 2.11 v 2.0.1

我的代码工作正常,但是当我重新启动它时......它将不记得最后一个检查点......

Function0<JavaStreamingContext> scFunction = new Function0<JavaStreamingContext>() {
        @Override
        public JavaStreamingContext call() throws Exception {
            //Spark Streaming needs to checkpoint enough information to a fault- tolerant storage system such
            JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.milliseconds(SPARK_DURATION));
          //checkpointDir = "hdfs://user:pw@192.168.1.50:54310/spark/checkpoint";
            ssc.sparkContext().setCheckpointDir(checkpointDir);
            StorageLevel.MEMORY_AND_DISK();
            return ssc;
        }
    };

    JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkpointDir, scFunction);

目前数据来自kafka,我正在进行一些转型和行动。

JavaPairDStream<Integer, Long> responseCodeCountDStream = logObject.transformToPair
            (MainApplication::responseCodeCount);
    JavaPairDStream<Integer, Long> cumulativeResponseCodeCountDStream = responseCodeCountDStream.updateStateByKey
            (COMPUTE_RUNNING_SUM);
    cumulativeResponseCodeCountDStream.foreachRDD(rdd -> {
        rdd.checkpoint();
        LOG.warn("Response code counts: " + rdd.take(100));
    });

如果我错过了什么,有人会指出我正确的方向吗?

另外,我可以看到检查点正在hdfs中保存。但为什么不读它呢?

0 个答案:

没有答案