Flink S3 StreamingFileSink 不将文件写入 S3

时间:2021-04-30 10:29:26

标签: java amazon-s3 apache-flink

我正在做一个使用 Flink 将数据写入 S3 的 POC。程序不会报错。但是我也没有看到在 S3 中写入任何文件。

下面是代码


public class StreamingJob {

    public static void main(String[] args) throws Exception {
        // set up the streaming execution environment
        final String outputPath = "s3a://testbucket-s3-flink/data/";
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        
        //Enable checkpointing
        env.enableCheckpointing();
        //S3 Sink
        final StreamingFileSink<String> sink = StreamingFileSink
                .forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8"))
                .build();


        //Source is a local kafka
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "kafka:9094");
        properties.setProperty("group.id", "test");
        

        DataStream<String> input = env.addSource(new FlinkKafkaConsumer<String>("queueing.transactions", new SimpleStringSchema(), properties));
        
        
        input.flatMap(new Tokenizer()) // Tokenizer for generating words
                .keyBy(0) // Logically partition the stream for each word
                .timeWindow(Time.minutes(1)) // Tumbling window definition
                .sum(1) // Sum the number of words per partition
                .map(value -> value.f0 + " count: " + value.f1.toString() + "\n")
                .addSink(sink);

        // execute program
        env.execute("Flink Streaming Java API Skeleton");
    }

    public static final class Tokenizer
            implements FlatMapFunction<String, Tuple2<String, Integer>> {

        @Override
        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
            String[] tokens = value.toLowerCase().split("\\W+");
            for (String token : tokens) {
                if (token.length() > 0) {
                    out.collect(new Tuple2<>(token, 1));
                }
            }
        }
    }
}

请注意,我在配置中设置了 s3.access-keys3.secret-key 值并通过将它们更改为不正确的值进行了测试(我在不正确的值上出现错误)

任何指针可能会出错?

1 个答案:

答案 0 :(得分:0)

您会遇到this issue吗?

<块引用>

鉴于 Flink sinks 和 UDF 通常不区分正常作业终止(例如有限输入流)和由于失败而终止,因此在作业正常终止时,最后一个正在进行的文件将不会转换到“完成”状态。