Flink中的时间窗口加入不一致?

时间:2019-02-14 15:11:49

标签: java apache-flink

将两个数据流转换为表并在time属性上进行窗口联接之后。此处描述:https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html#time-windowed-joins

代码类似于:

class POJO{
   String col1;
   String col2;
   long timestamp;
   /// other POJO stuff
}

StreamTableEnvironment tableEnv = StreamExecutionEnvironment.getTableEnvironment(env);
List<POJO> lst1 = ArrayList of 6000 elements, timestamp in ascending order. //All timestamps in lst1 are the same as in lst2
List<POJO> lst2 = ArrayList of 6000 elements, timestamp in ascending order. //All timestamps in lst1 are the same as in lst2
DataStream<POJO> ds1 = env.fromCollection().assignTimestampsandWatermarks()
DataStream<POJO> ds2 = env.fromCollection().assignTimestampsandWatermarks()

tableEnv.registerDataStream("ds1", ds1, "col1, col2, timestamp.rowtime")
tableEnv.registerDataStream("ds2", ds2, "col1, col2, timestamp.rowtime")


Table joinedTable = table.querySQL("SELECT col1, col2 FROM ds1 d1, ds2 d2 WHERE d1.col1 = d2.col1 AND d1.timestamp = d2.timestamp");

DataStream<POJO> joinedDS = tableEnv.toAppendStream(tableEnv.scan(joinedTable, POJO.class));

env.execute("Start pipeline");

假设lst1lst2 POJO对象在timestampcol1的对象中具有一对一的映射,因此在连接之后应该是6000元素。

问题是,如果我运行此程序2次,则最终joinedDS数据流中的项数将不相同!有时总共有5951个元素,有时有5987个等。事件数应始终为6000,前提是d1.col1 = d2.col1d1.timestamp = d2.timestamp始终为真。知道为什么会这样吗?

0 个答案:

没有答案