我确定这必须是一个Flink问题,因为经过测试的代码非常简单。
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// I don't need this for this particular example, but I use it in other place in my code.
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
SingleOutputStreamOperator<String> linesSource = env.readTextFile(inputFile).setParallelism(1);
SingleOutputStreamOperator<PositionEvent> mappedlines = linesSource.map(new Tokenizer());
SpeedRadar.run(mappedlines)
.writeAsCsv(String.format("%s/%s", outputFolder, SPEED_RADAR_FILE));
SpeedRadar
类的位置是:
public final class SpeedRadar {
private static final int MAXIMUM_SPEED = 90;
public static SingleOutputStreamOperator<SpeedEvent> run(SingleOutputStreamOperator<PositionEvent> stream) {
return stream
.filter((PositionEvent e) -> e.f2 > MAXIMUM_SPEED)
.map(new ToSpeedEvent());
}
我认为向您展示POJO和其他一些缺失的课程并不重要。问题是我正在读取像这样的csv文件中的行:130,1,65,0,3,0,49,100000
并且我正在过滤第三个字段大于90的行。
这是我的简单测试用例:
public class SpeedRadarTests extends StreamingMultipleProgramsTestBase {
private StreamExecutionEnvironment env;
@Before
public void createEnv() {
env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
SpeedEventSink.values.clear();
}
@Test
public void shouldDetectTwoOverSpeedEvents() throws Exception {
String[] data = new String[]{
"30,1,91,1,3,0,10,100000",
"60,2,90,2,2,1,20,200000",
"90,3,99,3,1,0,30,300000"
};
SingleOutputStreamOperator<PositionEvent> source
= new PositionStreamBuilder(env).fromLines(data).build();
SpeedRadar.run(source).addSink(new SpeedEventSink());
env.execute();
Map<String, SpeedEvent> events = SpeedEventSink.values;
assertEquals(2, events.size());
private static class SpeedEventSink implements SinkFunction<SpeedEvent> {
static final Map<String, SpeedEvent> values = new HashMap<>();
@Override
public synchronized void invoke(SpeedEvent speedEvent) throws Exception {
// I'm sure f1 is unique
values.put(speedEvent.f1, speedEvent);
}
}
}
这就是我创建&#34;测试流&#34;:
的方法public class PositionStreamBuilder {
private StreamExecutionEnvironment env;
private SingleOutputStreamOperator<PositionEvent> stream;
public PositionStreamBuilder(StreamExecutionEnvironment env) {
this.env = env;
}
public PositionStreamBuilder fromLines(String[] lines) {
stream = env.fromElements(lines)
.setParallelism(1)
.map(new VehicleTelematics.Tokenizer()); // the same Tokenizer as before
return this;
}
// more methods here
public SingleOutputStreamOperator<PositionEvent> build() {
return stream;
}
}
问题是,有时,我不知道为什么,断言失败,因为Map
只有一个元素。我按照Flink documentation中的步骤进行操作,唯一的区别是我没有将并行度设置为1(但无论如何,它不应该在此测试中产生影响)。
问题是,不仅这个测试失败了,有时其他不应该失败的测试失败。就像Flink有时会错过一个事件。
当我使用flink run
运行代码时,我从未错过任何元素。