有没有一种方法可以将仅过滤的事件从Apache Flink流传递到AsyncDataStream / AsyncIO流?

时间:2019-04-29 19:27:35

标签: apache-flink

所以我在Json中有一堆日志,并且有一个流可以验证/过滤出所需的Json,并且效果很好!

现在,我想使用AsyncIO从已过滤的Json中进行数据库查找,但是似乎asyncInvoke在流的每个输入上执行,而不是在已过滤的结果上执行。

DataStream<String> stringInputStream = env.addSource(flinkKafkaConsumer);

stringInputStream
    .flatMap(stringToJsonObject()) // Make sure only JSON logs go through.
    .returns(JsonObject.class)
    .filter(filterLogs("my-app")) // Filter logs for my-app
    .flatMap(jsonStringToJsonObject("someJsonEncodedStringField"))
    .returns(JsonObject.class)
    .filter(filterSpecificEvent()); // This stream works as expected, putting print() here only prints filtered events.

DataStream<JsonObject> lookupCarrierCodeStream = 
    AsyncDataStream.orderedWait(stringInputStream, lookupCodesInDB(), 3000, TimeUnit.MILLISECONDS, 100);

private static RichAsyncFunction<String, JsonObject> lookupCodesInDB() {
  return new RichAsyncFunction<String, JsonObject>() {
      @Override
      public void asyncInvoke(String input, ResultFuture<JsonObject> resultFuture) throws Exception {
          // This seems to receive all events, rather then the filtered ones.
          System.out.println("Input:" + input);

          resultFuture.complete(Collections.singleton(new JsonObject(input)));
      }
  };
}

更新

如果我像这样分割流,这似乎是可行的...

DataStream<String> kafkaStringInput = env.addSource(flinkKafkaConsumer);

DataStream<JsonObject> jsonLogsInput = ...;
DataStream<JsonObject> appLogsInput = ...;
DataStream<JsonObject> evenInput = ...;

DataStream<JsonObject> lookupStream = AsyncDataStream.orderedWait(evenInput, ...);

不知道为什么它不能流畅地工作,但是可以。

1 个答案:

答案 0 :(得分:1)

将功能应用于流,如

eventStream
  .flatmap()

不修改eventStream,而是返回一个新的流。

所以您想做这样的事情:

DataStream<JsonObject>filteredStream = stringInputStream
  .flatMap(stringToJsonObject())
  .returns(JsonObject.class)
  .filter(filterLogs("my-app"))
  .flatMap(jsonStringToJsonObject("someJsonEncodedStringField"))
  .returns(JsonObject.class)
  .filter(filterSpecificEvent());

DataStream<JsonObject> lookupCarrierCodeStream = 
  AsyncDataStream.orderedWait(filteredStream, lookupCodesInDB(), 3000, TimeUnit.MILLISECONDS, 100);