在Flink中并行运行流和批处理是否有意义?
//calculate median using DataSet (Batch Environment)
BatchFunctions batch = new BatchFunctions();
DataSet<Tuple2<Double, Integer>> dataSet1 = batch.loadDataSetOfOctober2016();
double median = batch.getMedianReactionTime(dataSet1);
// now use the calculated median in the DataStream (stream environment)
StreamFunctions stream = new StreamFunctions();
DataStream<Tuple7<String, String, Integer, String, Date, String, List<Double>>> dataStream1 = stream.getKafkaStream();
stream.printPredictionForNextReactionTimeByMedians(dataStream1, median, Time.seconds(10));
stream.execute();
答案 0 :(得分:2)
我宁愿不去做。 如果您的流式传输过程取决于批量结果。您可以提前获得批处理结果并放入队列或数据库表,流处理可以从中获取结果,因此您无需在批处理结果更改时重新启动它。 因为流媒体过程有点无限。但批处理结果可能会发生变化,因为您可以在不同的输入上运行它。