如何在Stream Dataflow程序的运行时更改运算符?

时间:2018-12-20 13:16:12

标签: apache-spark apache-kafka apache-flink apache-storm apache-edgent

我想知道是否可以更改已经提交给Flink的作业的操作员。假设我有一个单词计数程序,并且上面有一个过滤器,只计算大于3个字符的单词。我想在运行时更改此过滤器的参数。我的第一个猜测是Flink(以及其他数据流引擎Spark,Storm,Apache Edgent)无法执行此操作,因为该作业已在env.execute()上提交。有谁知道执行此操作的任何方法?

我想这个问题(Deploy stream processing topology on runtime?)与我想要的东西有关,但是解决方案仍然不是我想要的动态。

谢谢

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<Tuple2<String, Integer>> dataStream = env.socketTextStream("localhost", 9000)
        .flatMap(new SplitterFlatMap()).keyBy(0)
        .sum(1)
        .filter(word -> word.f1 >= 3);
dataStream.print();
env.execute("WordCountSocketFilterQEP");

3 个答案:

答案 0 :(得分:1)

使用Flink,可以将广播流连接到键控流,并以要使用的参数或代码进行广播。 TaxiQuery是将Janino与Java表达式结合使用的一个示例,但您可以改为动态加载类。我还看到这是通过Rhino / Javascript,JRuby等完成的。

答案 1 :(得分:1)

为使parameterStream的值发送给所有运算符,必​​须使用BroadcastStream。请注意(从Flink 1.6开始?),这还使您可以保持广播状态,即将向DynamicFilterCoFlatMapper的所有实例发送的“规则”或配置设置将自动保存为状态。

答案 2 :(得分:0)

我想在Flink中我可以使用CoFlatMapFunction-> Flink: How to handle external app configuration changes in flink。但是在Apache Edgent中,我不确定是否有办法做到这一点。 这是我的实现>

package org.sense.flink.examples.stream;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.CoFlatMapFunction;
import org.apache.flink.util.Collector;
import org.sense.flink.mqtt.FlinkMqttConsumer;
import org.sense.flink.mqtt.MqttMessage;

public class SensorsDynamicFilterMqttEdgentQEP {

    public SensorsDynamicFilterMqttEdgentQEP() throws Exception {

        // Start streaming from fake data source sensors
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // obtain execution environment, run this example in "ingestion time"
        env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);

        DataStream<MqttMessage> temperatureStream = env.addSource(new FlinkMqttConsumer("topic-edgent"));
        DataStream<Tuple2<Double, Double>> parameterStream = env.addSource(new FlinkMqttConsumer("topic-parameter"))
                .map(new ParameterMapper());

        DataStream<MqttMessage> filteredStream = temperatureStream.connect(parameterStream.broadcast())
                .flatMap(new DynamicFilterCoFlatMapper());

        filteredStream.print();

        String executionPlan = env.getExecutionPlan();
        System.out.println("ExecutionPlan ........................ ");
        System.out.println(executionPlan);
        System.out.println("........................ ");

        env.execute("SensorsDynamicFilterMqttEdgentQEP");
    }

    public static class DynamicFilterCoFlatMapper
            implements CoFlatMapFunction<MqttMessage, Tuple2<Double, Double>, MqttMessage> {

        private static final long serialVersionUID = -8634404029870404558L;
        private Tuple2<Double, Double> range = new Tuple2<Double, Double>(-1000.0, 1000.0);

        @Override
        public void flatMap1(MqttMessage value, Collector<MqttMessage> out) throws Exception {

            double payload = Double.parseDouble(value.getPayload());

            if (payload >= this.range.f0 && payload <= this.range.f1) {
                out.collect(value);
            }
        }

        @Override
        public void flatMap2(Tuple2<Double, Double> value, Collector<MqttMessage> out) throws Exception {
            this.range = value;
        }
    }

    public static class ParameterMapper implements MapFunction<MqttMessage, Tuple2<Double, Double>> {

        private static final long serialVersionUID = 7322348505833012711L;

        @Override
        public Tuple2<Double, Double> map(MqttMessage value) throws Exception {
            String[] array = value.getPayload().split(",");
            double min = Double.parseDouble(array[0]);
            double max = Double.parseDouble(array[1]);
            return new Tuple2<Double, Double>(min, max);
        }
    }
}