我目前正在尝试使用Spark流来从Kafka主题获取输入,从而将该输入保存在Json文件中。我到目前为止,我可以将我的InputDStream保存为textFile,但问题是,在每次批处理之后,文件被覆盖,似乎我对此无能为力。
是否有方法或配置选项可以更改此设置?
我试过spark.files.overwrite ,false
但它不起作用。
我的代码是:
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("local-test").setMaster("local[*]")
.set("spark.shuffle.service.enabled", "false")
.set("spark.dynamicAllocation.enabled", "false")
.set("spark.io.compression.codec", "snappy")
.set("spark.rdd.compress", "true").set("spark.executor.instances","4").set("spark.executor.memory","6G")
.set("spark.executor.cores","6")
.set("spark.cores.max","8")
.set("spark.driver.memory","2g")
.set("spark.files.overwrite","false");
JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.seconds(4));
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "xxxxx");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "ID2");
List<String> topics = Arrays.asList("LEGO_MAX");
JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(ssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
JavaDStream<String> first = stream.map(record -> (record.value()).toString());
first.foreachRDD(rdd -> rdd.saveAsTextFile("C:\\Users\\A675866\\Hallo.txt"));
ssc.start();
try {
ssc.awaitTermination();
} catch (InterruptedException e) {
System.out.println("Failed to cut connection -> Throwing Error");
e.printStackTrace();
}
}