如何将day和time_stamp添加到kafka流输出中

时间:2018-02-08 00:02:34

标签: apache-kafka apache-kafka-streams apache-kafka-connect

我目前正在汇总Kafka流并将计数发送到一个主题。我还需要添加计数的日期和时间戳。这是最好的方法吗?

这些是我使用的配置:

    streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 60 * 1000);

代码:

final KTable<Windowed<String>, Long> aggregated = feeds
 .map((key, value) -> new KeyValue<>(value.getUserId().toString(), value))
.groupByKey()
.count(TimeWindows.of(TimeUnit.MINUTES.toMillis(1),STATE_STORE);

// We want to compute the total count of users, so we must re-key all records to the same key. 
aggregated.selectKey((k, v) -> "user_count as of  ")
            .transform(() -> new Transformer<String, Long, KeyValue<String, Long>>() {
                private ProcessorContext context;

                @Override
                public void init(ProcessorContext context) {
                    this.context = context;
                }

                @Override
                public KeyValue<String, Long> transform(String key, Long value) {
                    long timestamp = context.timestamp();
                    SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm.ss");
                    key = key + sdf.format(timestamp);
                    // transform value using timestamp
                    return new KeyValue<>(key, value);
                }

                @Override
                public KeyValue<String, Long> punctuate(long timestamp) {
                    return null;
                }

                @Override
                public void close() {
                }
            })
            .groupByKey(stringSerde, longSerde)
            .count("test_store1").toStream()
            .print();

输出:

[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32541
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32542
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32543
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32544
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32545
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32546
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32547
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32548
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32549
[KTABLE-TOSTREAM-0000000015]:user_count as of 2018-02-02 12:38 , 32550   

1 个答案:

答案 0 :(得分:1)

您可以使用KStream#transform()获取时间戳:

table
  .toStream()
  .transform(() -> new Transformer<Object, Object, KeyValue<Object,Object>>() {
      private ProcessorContext context;

      @Override
      public void init(ProcessorContext context) {
          this.context = context;
      }

      @Override
      public KeyValue<Object, Object> transform(Object key, Object value) {
          long timestamp = context.timestamp();
          // transform value using timestamp
          return new KeyValue<>(key, value);
      }

      @Override
      public KeyValue<Object, Object> punctuate(long timestamp) {
          return null;
      }

      @Override
      public void close() {
      }
  })
  .to("output");

请注意,此时间戳是触发计算的输入记录的时间戳。我认为这就是你要找的东西。

否则,如果您只需要使用类似System.currentTimeMillis()的内容来转换计数,并且您不需要记录中的时间戳,正如Matthias在评论中指出的那样,您可以使用KTable#mapValue()