将两个数据流转换为表并在time属性上进行窗口联接之后。此处描述:https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html#time-windowed-joins
代码类似于:
class POJO{
String col1;
String col2;
long timestamp;
/// other POJO stuff
}
StreamTableEnvironment tableEnv = StreamExecutionEnvironment.getTableEnvironment(env);
List<POJO> lst1 = ArrayList of 6000 elements, timestamp in ascending order. //All timestamps in lst1 are the same as in lst2
List<POJO> lst2 = ArrayList of 6000 elements, timestamp in ascending order. //All timestamps in lst1 are the same as in lst2
DataStream<POJO> ds1 = env.fromCollection().assignTimestampsandWatermarks()
DataStream<POJO> ds2 = env.fromCollection().assignTimestampsandWatermarks()
tableEnv.registerDataStream("ds1", ds1, "col1, col2, timestamp.rowtime")
tableEnv.registerDataStream("ds2", ds2, "col1, col2, timestamp.rowtime")
Table joinedTable = table.querySQL("SELECT col1, col2 FROM ds1 d1, ds2 d2 WHERE d1.col1 = d2.col1 AND d1.timestamp = d2.timestamp");
DataStream<POJO> joinedDS = tableEnv.toAppendStream(tableEnv.scan(joinedTable, POJO.class));
env.execute("Start pipeline");
假设lst1
和lst2
POJO
对象在timestamp
和col1
的对象中具有一对一的映射,因此在连接之后应该是6000
元素。
问题是,如果我运行此程序2次,则最终joinedDS
数据流中的项数将不相同!有时总共有5951个元素,有时有5987个等。事件数应始终为6000
,前提是d1.col1 = d2.col1
和d1.timestamp = d2.timestamp
始终为真。知道为什么会这样吗?