I want to implement an operator that has two input streams and takes an item from each stream to process both simultaneously e.g. join. In addition, if one of both inputs does not have any data, the operator would block and wait for it.
If I have to do this, which classes are involved? A tutorial about it is much better. Any suggestion would be appreciated!
答案 0 :(得分:4)
您需要连接两个DataStream
并应用TwoInputStreamOperator
。已经有一堆预定义的运算符。在您的情况下,CoFlatMapFunction
将是一个不错的选择:
DataStream input1 = ...
DataStream input2 = ...
input1.connect(input2).flatMap(new MyOwnCoFlatMapFunction());
此处提供更多详情:https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#co-operators
但是,此操作符无法阻止您执行此操作。因此,您需要应用以下模式:每次从左侧或右侧接收输入时,如果没有来自另一侧的输入,则需要缓冲输入:
MyOwnCoFlatMapFunction implements CoFlatMapFunction {
List<IN> leftInput = new LinkedList<IN>();
List<IN> rightInput = new LinkedList<IN>();
void flatMap1(IN1 value, Collector<OUT> out) throws Exception {
if(rightInput.size() > 0) {
IN right = rightInput.remove();
// process left input (value) and right input (right) together
} else {
leftInput.add(value);
}
}
// reverse pattern for flatMap2 here
}
但是,您需要注意流处理中的阻止是危险。如果您的输入流具有不同的日期速率,则此方法将不起作用(!),因为较慢的流限制了较快的流,从而导致较快的流的背压。我不知道你的用例,但它似乎是“错误的”。为什么你不准时参加?