使用Flink和kinesis流进行流窗口处理处理不起作用

时间:2017-08-28 09:49:52

标签: stream apache-flink amazon-kinesis flink-streaming

我正在使用Flink阅读kinesis流。它根据时间窗口和密钥聚合某些事件。在reduce之后代码没有做任何事情。没有数据映射到输出csv中。我等了好几分钟(即使时间窗只有两分钟)。

public static void main(String[] args) throws Exception {

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
    env.enableCheckpointing(CommonTimeConstants.TWO_MINUTES.toMilliseconds());
    env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
    env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, Time.of(1, TimeUnit.MINUTES)));

    Properties consumerConfig = new Properties();
    consumerConfig.put(ConsumerConfigConstants.AWS_REGION, PropertyFileUtils.get("aws.region", ""));
    consumerConfig.put(ConsumerConfigConstants.AWS_ACCESS_KEY_ID, PropertyFileUtils.get("aws.accessKeyId", ""));
    consumerConfig.put(ConsumerConfigConstants.AWS_SECRET_ACCESS_KEY, PropertyFileUtils.get("aws.secretAccessKey", ""));
    consumerConfig.put(ConsumerConfigConstants.STREAM_INITIAL_POSITION, "TRIM_HORIZON");

    DataStream<APIActionLog> apiLogRecords = env.addSource(new FlinkKinesisConsumer<>(
            ProjectProperties.SOURCE_ENV_PREFIX, // stream name
            new StreamedApiLogRecordDeserializationSchema(),
            consumerConfig));

    apiLogRecords.assignTimestampsAndWatermarks(API_LOG_RECORD_BOUNDED_OUT_OF_ORDERNESS_TIMESTAMP_EXTRACTOR);

    DataStream<Tuple7<String, String, String, String, Timestamp, String, Integer>> skuPlatformTsCount =
            apiLogRecords.flatMap(collecting events...)
                    .keyBy(Key based on some parameters of the event...)
                    .timeWindow(TWO_MINUTES)
                    .reduce(adding up event parameter..., window function...)
                    .map(Map to get a different tuple format...);

    skuPlatformTsCount.writeAsCsv("/Users/uday/Desktop/out.csv", FileSystem.WriteMode.OVERWRITE);

    env.execute("Processing ATC Log Stream");
}

private static final BoundedOutOfOrdernessTimestampExtractor<APIActionLog> API_LOG_RECORD_BOUNDED_OUT_OF_ORDERNESS_TIMESTAMP_EXTRACTOR =
        new BoundedOutOfOrdernessTimestampExtractor<APIActionLog>(TEN_SECONDS) {
            private static final long serialVersionUID = 1L;

            @Override
            public long extractTimestamp(APIActionLog apiActionLog) {
                return apiActionLog.getTs().getTime();
            }
        };

1 个答案:

答案 0 :(得分:0)

这是一个愚蠢的错误。

apiLogRecords.assignTimestampsAndWatermarks(API_LOG_RECORD_BOUNDED_OUT_OF_ORDERNESS_TIMESTAMP_EXTRACTOR);

调用返回带有指定水印的新流。此返回值应在以后的操作中使用。