我有一个表示数据流的集合,并测试StreamingFileSink将流写入S3。程序运行成功,但是给定的S3路径中没有数据。
public class S3Sink {
public static void main(String args[]) throws Exception {
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.enableCheckpointing(100);
List<String> input = new ArrayList<>();
input.add("test");
DataStream<String> inputStream = see.fromCollection(input);
RollingPolicy<Object, String> rollingPolicy = new CustomRollingPolicy();
StreamingFileSink s3Sink = StreamingFileSink.
forRowFormat(new Path("<S3 Path>"),
new SimpleStringEncoder<>("UTF-8"))
.withRollingPolicy(rollingPolicy)
.build();
inputStream.addSink(s3Sink);
see.execute();
}
}
也启用了检查点。关于为什么Sink不能按预期工作的任何想法?
更新: 根据David的回答,创建了自定义源,该源连续生成随机字符串,我希望Checkpointing在配置间隔后触发,以将数据写入S3。
public class S3SinkCustom {
public static void main(String args[]) throws Exception {
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.enableCheckpointing(1000);
DataStream<String> inputStream = see.addSource(new CustomSource());
RollingPolicy<Object, String> rollingPolicy = new CustomRollingPolicy();
StreamingFileSink s3Sink = StreamingFileSink.
forRowFormat(new Path("s3://mybucket/data/"),
new SimpleStringEncoder<>("UTF-8"))
.build();
//inputStream.print();
inputStream.addSink(s3Sink);
see.execute();
}
static class CustomSource extends RichSourceFunction<String> {
private volatile boolean running = false;
final String[] strings = {"ABC", "XYZ", "DEF"};
@Override
public void open(Configuration parameters){
running = true;
}
@Override
public void run(SourceContext sourceContext) throws Exception {
while (running) {
Random random = new Random();
int index = random.nextInt(strings.length);
sourceContext.collect(strings[index]);
Thread.sleep(1000);
}
}
@Override
public void cancel() {
running = false;
}
}
}
仍然,s3中没有数据,并且在S3存储桶有效或无效的情况下,Flink进程甚至都没有验证,但是该进程运行没有任何问题。
更新:
以下是自定义滚动策略的详细信息:
public class CustomRollingPolicy implements RollingPolicy<Object, String> {
@Override
public boolean shouldRollOnCheckpoint(PartFileInfo partFileInfo) throws IOException {
return partFileInfo.getSize() > 1;
}
@Override
public boolean shouldRollOnEvent(PartFileInfo partFileInfo, Object o) throws IOException {
return true;
}
@Override
public boolean shouldRollOnProcessingTime(PartFileInfo partFileInfo, long l) throws IOException {
return true;
}
}
答案 0 :(得分:0)
使用所需的s3a属性(例如fs.s3a.access.key,fs.s3a.secret.key)设置flink-conf.yaml后,上述问题已解决。
我们还需要让Flink知道配置位置。
FileSystem.initialize(GlobalConfiguration.loadConfiguration(“”));
通过这些更改,我能够从本地运行S3接收器,并且消息可以持久保存到S3中。