什么可能导致我的Spark Streaming检查点不完整?

时间:2016-12-29 17:15:16

标签: java apache-spark spark-streaming

我正在使用Spark Streaming API,并专门测试检查点功能。但是,我发现在某些情况下返回的检查点不完整。

以下代码在local[2]模式下运行(虽然我注意到在运行它时发生了类似的现象)对2.1.0版本(针对Scala 2.11编译):

public static void main (String[] args) throws Exception {
    SparkConf spark = new SparkConf();

    createAppendablePrintStream().println(ZonedDateTime.now() + " Starting stream");
    String checkpoint = "/export/spark/checkpoint"; // NFS mounted directory
    JavaStreamingContext jssc = JavaStreamingContext.getOrCreate(checkpoint, () -> {
        JavaStreamingContext x = new JavaStreamingContext(spark, Durations.seconds(5));
        x.checkpoint(checkpoint);
        JavaDStream<String> lines = x.socketTextStream("192.168.8.130", 7777); // IP address of my local VM
        JavaPairDStream<String, Integer> stateByType = lines.mapToPair(line -> new Tuple2(line.split(" ")[0], line)).updateStateByKey((Function2<List<String>, Optional<Integer>, Optional<Integer>>) (values, state) -> Optional.of(state.orElse(0) + values.size()));
        stateByType.foreachRDD(rdd -> createAppendablePrintStream().println(ZonedDateTime.now() + " Current state: " + rdd.collectAsMap()));
        return x;
    });

    jssc.start();
    jssc.awaitTermination();
    createAppendablePrintStream().println(ZonedDateTime.now() + " Closing stream");
}

private static PrintStream createAppendablePrintStream() {
    try {
        return new PrintStream(new FileOutputStream("/tmp/result.txt", true));
    } catch (FileNotFoundException e) {
        throw new RuntimeException(e);
    }
}

当我向此流添加新密钥并立即关闭驱动程序时,它似乎不会作为检查点的一部分进行恢复,如以下日志摘录所示:

2016-12-29T16:53:33.185Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:53:35.086Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:53:40.288Z[Europe/London] Current state: {WARN:=2, ERROR:=1, INFO:=1}
2016-12-29T16:53:43.695Z[Europe/London] Closing stream
2016-12-29T16:53:53.100Z[Europe/London] Starting stream
2016-12-29T16:54:08.154Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:13.226Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:15.026Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:15.768Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:17.136Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:17.521Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:18.795Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:19.360Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:20.634Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:25.052Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:30.066Z[Europe/London] Current state: {WARN:=2, ERROR:=1, ALERT:=1}

(注意在启动后添加了ALERT条目以显示永远不会返回INFO条目。)

但是,当我允许新密钥保持第二帧状态的一部分时,它将立即从检查点恢复,如此日志摘录所示:

2016-12-29T16:54:25.052Z[Europe/London] Current state: {WARN:=2, ERROR:=1}
2016-12-29T16:54:30.066Z[Europe/London] Current state: {WARN:=2, ERROR:=1, ALERT:=1}
2016-12-29T16:54:35.051Z[Europe/London] Current state: {WARN:=2, ERROR:=1, ALERT:=1}
2016-12-29T16:54:38.545Z[Europe/London] Closing stream
2016-12-29T16:54:47.306Z[Europe/London] Starting stream
2016-12-29T16:55:01.982Z[Europe/London] Current state: {WARN:=2, ERROR:=1, ALERT:=1}

对这种不完整的状态有解释吗?这可以通过配置更改来解决吗?或者我是否需要向Spark人员提交错误报告?

2 个答案:

答案 0 :(得分:2)

我不知道你是怎么停止StreamingContext的。但是,对于基于接收器的流,您需要将spark.streaming.receiver.writeAheadLog.enable设置为true以启用预写日志。否则,正如您已经看到的那样,最后一批可能会丢失,因为Spark Streaming无法重放它。

有关详细信息,请参阅http://spark.apache.org/docs/latest/streaming-programming-guide.html#fault-tolerance-semantics

答案 1 :(得分:0)

为避免丢失过去收到的数据,Spark 1.2引入了预写日志,将收到的数据保存到容错存储中。

SparkConf sparkConf = new SparkConf();
sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true");

否则从检查点恢复时,某些批次可能会丢失并出现故障。