我有以下两种数据类型的流。
DataStream<Tuple3<String, Integer, Double>> splittedActivationTuple;
DataStream<Tuple2<String, Double>> unionReloadsStream;
这些流正在从Kafka获取数据并以不同的频率获取数据。 “ unionReloadsStream”将比“ splittedActivationTuple”接收更多的数据。如果匹配的数据来自unionReloadsStream(字符串字段是公共字段),我需要在24小时的窗口中存储“ splittedActivationTuple”并操纵其“ Double”字段。
因此,我编写了以下方法来执行此任务。
public static DataStream<Tuple3<String, Integer, Double>> joinActivationsBasedOnReload(
DataStream<Tuple3<String, Integer, Double>> activationsStream,
DataStream<Tuple2<String, Double>> unifiedReloadStream) {
return activationsStream.join(unifiedReloadStream).where(new ActivationStreamSelector())
.equalTo(new ReloadStreamSelector()).window(GlobalWindows.create())
.evictor(TimeEvictor.of(Time.of(24, TimeUnit.HOURS)))
.apply(new JoinFunction<Tuple3<String, Integer, Double>, Tuple2<String, Double>, Tuple3<String, Integer, Double>>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple3<String, Integer, Double> join(Tuple3<String, Integer, Double> first,
Tuple2<String, Double> second) {
return new Tuple3<String, Integer, Double>(first.f0, first.f1, first.f2 + second.f1);
}
});
}
并呼叫为
DataStream<Tuple3<String, Integer, Double>> activationWindowStream = joinActivationsBasedOnReload(splittedActivationTuple, unionReloadsStream);
activationWindowStream.print();
但是我看不到任何打印内容。
我希望“ activationWindowStream”包含“ splittedActivationTuple”(较小集)数据,如果unionReloadsStream的传入元素具有匹配的“ String”字段,则将累积Double值。但这没有发生。我在哪里失踪?
谢谢, 拉克什