我正在使用定制火花接收器进行火花流。在接收器中,我正在从文件中读取文本并生成其RDD。
问题是,数据保留在文件中并且spark自定义接收器再次读取它。所以我想从文件中删除数据,以避免一旦spark读取它就会重复。
接收器功能如下所示
private void receive() {
try {
List<String> blocks = new ArrayList<>();
while (!isStopped()){
JavaSparkContext sc = spark.getSparkContext();
JavaRDD<String> lines = sc.textFile("src/dummy2.csv");
blocks = lines.collect();
store(blocks.iterator());
blocks.clear();
}
// Restart in an attempt to connect again when server is active again
restart("Trying to connect again");
} catch(Throwable t) {
// restart if there is any other error
restart("Error receiving data", t);
}
}