尝试写入 S3 时,StreamingFileSink 有时不起作用

时间:2021-02-25 22:44:46

标签: amazon-s3 apache-flink flink-streaming

我正在尝试写入 S3 接收器。

private static StreamingFileSink<String> createS3SinkFromStaticConfig(
        final Map<String, Properties> applicationProperties
) {
    Properties sinkProperties = applicationProperties.get(SINK_PROPERTIES);
    String s3SinkPath = sinkProperties.getProperty(SINK_S3_PATH_KEY);
    return StreamingFileSink
            .forRowFormat(
                    new Path(s3SinkPath),
                    new SimpleStringEncoder<String>(StandardCharsets.UTF_8.toString())
            )
            .build();
}

以下代码有效,我可以在 S3 中看到结果

input.map(value -> { // Parse the JSON
    JsonNode jsonNode = jsonParser.readValue(value, JsonNode.class);
    return new Tuple2<>(jsonNode.get("ticker").asText(), jsonNode.get("price").asDouble());
}).returns(Types.TUPLE(Types.STRING, Types.DOUBLE))
        .keyBy(0) // Logically partition the stream per stock symbol
        .timeWindow(Time.seconds(10), Time.seconds(5)) // Sliding window definition
        .min(1) // Calculate minimum price per stock over the window
        .setParallelism(3) // Set parallelism for the min operator
        .map(value -> value.f0 + ": ----- " + value.f1.toString() + "\n")
        .addSink(createS3SinkFromStaticConfig(applicationProperties));

但以下内容不会向 S3 写入任何内容。

KeyedStream<EnrichedMetric, EnrichedMetricKey> input = env.addSource(new EnrichedMetricSource())
        .assignTimestampsAndWatermarks(
                WatermarkStrategy.<EnrichedMetric>forMonotonousTimestamps()
                        .withTimestampAssigner(((event, l) -> event.getEventTime()))
        ).keyBy(new EnrichedMetricKeySelector());

DataStream<String> statsStream = input
        .window(TumblingEventTimeWindows.of(Time.seconds(5)))
        .process(new PValueStatisticsWindowFunction());

statsStream.addSink(createS3SinkFromStaticConfig(applicationProperties));

PValueStatisticsWindowFunction 是一个 ProcessWindowFunction,如下所示。

@Override
public void process(EnrichedMetricKey enrichedMetricKey,
                    Context context,
                    Iterable<EnrichedMetric> in,
                    Collector<String> out) throws Exception {

    int count = 0;
    for (EnrichedMetric m : in) {
        count++;
    }

    out.collect("Count: " + count);
}

当我在本地运行 Flink 应用程序时,statsStream.print() 将结果打印到 log/flink-*-taskexecutor-*.out

在集群中,我可以看到检查点已启用以及 Flink 仪表板中的各种检查点历史记录。我还确保 S3 路径的格式为 s3a://<bucket>

不确定我在这里遗漏了什么。

0 个答案:

没有答案