Spark saveAsTextFile在每个批处理后覆盖文件

时间:2018-02-21 09:27:33

标签: apache-spark apache-kafka spark-streaming

我目前正在尝试使用Spark流来从Kafka主题获取输入,从而将该输入保存在Json文件中。我到目前为止,我可以将我的InputDStream保存为textFile,但问题是,在每次批处理之后,文件被覆盖,似乎我对此无能为力。

是否有方法或配置选项可以更改此设置? 我试过spark.files.overwrite ,false 但它不起作用。

我的代码是:

public static void main(String[] args) {

    SparkConf conf = new SparkConf().setAppName("local-test").setMaster("local[*]")
            .set("spark.shuffle.service.enabled", "false")
            .set("spark.dynamicAllocation.enabled", "false")
            .set("spark.io.compression.codec", "snappy")
            .set("spark.rdd.compress", "true").set("spark.executor.instances","4").set("spark.executor.memory","6G")
            .set("spark.executor.cores","6")
            .set("spark.cores.max","8")
            .set("spark.driver.memory","2g")
            .set("spark.files.overwrite","false");





    JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.seconds(4));



    Map<String, Object> kafkaParams = new HashMap<>();
    kafkaParams.put("bootstrap.servers", "xxxxx");
    kafkaParams.put("key.deserializer", StringDeserializer.class);
    kafkaParams.put("value.deserializer", StringDeserializer.class);
    kafkaParams.put("group.id", "ID2");

    List<String> topics = Arrays.asList("LEGO_MAX");


    JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(ssc,
            LocationStrategies.PreferConsistent(),
            ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));



    JavaDStream<String> first = stream.map(record -> (record.value()).toString());
    first.foreachRDD(rdd -> rdd.saveAsTextFile("C:\\Users\\A675866\\Hallo.txt"));


    ssc.start();

    try {
        ssc.awaitTermination();
    } catch (InterruptedException e) {
        System.out.println("Failed to cut connection -> Throwing Error");
        e.printStackTrace();
    }
}

0 个答案:

没有答案