Question

在Flink中并行运行流和批处理是否有意义？

//calculate median using DataSet (Batch Environment)
BatchFunctions batch = new BatchFunctions();
DataSet<Tuple2<Double, Integer>> dataSet1 = batch.loadDataSetOfOctober2016();
double median = batch.getMedianReactionTime(dataSet1);

// now use the calculated median in the DataStream (stream environment)
StreamFunctions stream = new StreamFunctions();
DataStream<Tuple7<String, String, Integer, String, Date, String, List<Double>>> dataStream1 = stream.getKafkaStream();
stream.printPredictionForNextReactionTimeByMedians(dataStream1, median, Time.seconds(10));
stream.execute();

Answer 1

我宁愿不去做。如果您的流式传输过程取决于批量结果。您可以提前获得批处理结果并放入队列或数据库表，流处理可以从中获取结果，因此您无需在批处理结果更改时重新启动它。因为流媒体过程有点无限。但批处理结果可能会发生变化，因为您可以在不同的输入上运行它。

使用Flink并排运行流和批处理环境

1 个答案: