Flink获取KeyedState状态值并在另一个流中使用

时间:2020-06-10 10:40:12

标签: apache-flink flink-streaming

我知道键控状态属于它的键,只有当前键访问它的状态值,其他键不能访问另一个键的状态值。

我尝试使用相同的密钥但以不同的流访问状态。有可能吗?

如果不可能,那么我将有2个重复数据?

不是:我需要两个流,因为它们每个都有不同的时间窗口和不同的实现。

这里是示例(我知道两个流操作的keyBy(sommething)都相同):

public class Sample{
       streamA
                .keyBy(something)
                .timeWindow(Time.seconds(4))
                .process(new CustomMyProcessFunction())
                .name("CustomMyProcessFunction")
                .print();

       streamA
                .keyBy(something)
                .timeWindow(Time.seconds(1))
                .process(new CustomMyAnotherProcessFunction())
                .name("CustomMyProcessFunction")
                .print();
}

public class CustomMyProcessFunction extends ProcessWindowFunction<..>
{
    private Logger logger = LoggerFactory.getLogger(CustomMyProcessFunction.class);
    private transient ValueState<SimpleEntity> simpleEntityValueState;
    private SimpleEntity simpleEntity;

    @Override
    public void open(Configuration parameters) throws Exception
    {
        ValueStateDescriptor<SimpleEntity> simpleEntityValueStateDescriptor = new ValueStateDescriptor<SimpleEntity>(
                "sample",
                TypeInformation.of(SimpleEntity.class)
        );
        simpleEntityValueState = getRuntimeContext().getState(simpleEntityValueStateDescriptor);
    }

    @Override
    public void process(...) throws Exception
    {
        SimpleEntity value = simpleEntityValueState.value();
        if (value == null)
        {
            SimpleEntity newVal = new SimpleEntity("sample");
            logger.info("New Value put");
            simpleEntityValueState.update(newVal);
        }
        ...
    }
...
}

public class CustomMyAnotherProcessFunction extends ProcessWindowFunction<..>
{


    private transient ValueState<SimpleEntity> simpleEntityValueState;

    @Override
    public void open(Configuration parameters) throws Exception
    {

        ValueStateDescriptor<SimpleEntity> simpleEntityValueStateDescriptor = new ValueStateDescriptor<SimpleEntity>(
                "sample",
                TypeInformation.of(SimpleEntity.class)
        );
        simpleEntityValueState = getRuntimeContext().getState(simpleEntityValueStateDescriptor);
    }

    @Override
    public void process(...) throws Exception
    {
        SimpleEntity value = simpleEntityValueState.value();
        if (value != null)
            logger.info(value.toString()); // I expect that SimpleEntity("sample")
        out.collect(...);
    }
...
}

3 个答案:

答案 0 :(得分:1)

我尝试了您的想法,使用相同的密钥在两个操作员之间共享状态。

import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;

import java.io.IOException;

public class FlinkReuseState {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(3);

        DataStream<Integer> stream1 = env.addSource(new SourceFunction<Integer>() {
            @Override
            public void run(SourceContext<Integer> sourceContext) throws Exception {
                int i = 0;
                while (true) {
                    sourceContext.collect(1);
                    Thread.sleep(1000);
                }
            }

            @Override
            public void cancel() {

            }
        });

        DataStream<Integer> stream2 = env.addSource(new SourceFunction<Integer>() {
            @Override
            public void run(SourceContext<Integer> sourceContext) throws Exception {
                while (true) {
                    sourceContext.collect(1);
                    Thread.sleep(1000);
                }
            }

            @Override
            public void cancel() {

            }
        });


        DataStream<Integer> windowedStream1 = stream1.keyBy(Integer::intValue)
                .timeWindow(Time.seconds(3))
                .process(new ProcessWindowFunction<Integer, Integer, Integer, TimeWindow>() {
                    private ValueState<Integer> value;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        ValueStateDescriptor<Integer> desc = new ValueStateDescriptor<Integer>("value", Integer.class);
                        value = getRuntimeContext().getState(desc);
                    }

                    @Override
                    public void process(Integer integer, Context context, Iterable<Integer> iterable, Collector<Integer> collector) throws Exception {
                        iterable.forEach(x -> {
                            try {
                                if (value.value() == null) {
                                    value.update(1);
                                } else {
                                    value.update(value.value() + 1);
                                }
                            } catch (IOException e) {
                                e.printStackTrace();
                            }
                        });
                        collector.collect(value.value());
                    }
                });

        DataStream<String> windowedStream2 = stream2.keyBy(Integer::intValue)
                .timeWindow(Time.seconds(3))
                .process(new ProcessWindowFunction<Integer, String, Integer, TimeWindow>() {

                    private ValueState<Integer> value;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        ValueStateDescriptor<Integer> desc = new ValueStateDescriptor<Integer>("value", Integer.class);
                        value = getRuntimeContext().getState(desc);
                    }

                    @Override
                    public void process(Integer s, Context context, Iterable<Integer> iterable, Collector<String> collector) throws Exception {
                        iterable.forEach(x -> {
                            try {
                                if (value.value() == null) {
                                    value.update(1);
                                } else {
                                    value.update(value.value() + 1);
                                }
                            } catch (IOException e) {
                                e.printStackTrace();
                            }
                        });
                        collector.collect(String.valueOf(value.value()));
                    }
                });

        windowedStream2.print();

        windowedStream1.print();

        env.execute();

    }
}

它不起作用,每个流仅更新其自己的值状态,输出在下面列出。

3> 3
3> 3
3> 6
3> 6
3> 9
3> 9
3> 12
3> 12
3> 15
3> 15
3> 18
3> 18
3> 21
3> 21
3> 24
3> 24

keyed state

根据官方文档,*每个键状态在逻辑上都绑定到<parallel-operator-instance, key>的唯一组合,并且由于每个键“都”完全属于键运算符的一个并行实例,因此我们可以想到这一点就像<operator, key>*一样。

我认为不可能通过给不同运算符中的状态赋予相同的名称来共享状态。

您是否尝试过协同处理功能?这样,您还可以为每个流实现两个过程函数,唯一的问题就是时间窗口。您能否提供有关流程逻辑的更多详细信息?

答案 1 :(得分:1)

正如已经指出的,状态始终是单个操作员实例的本地状态。无法共享。

但是,您可以做的是将状态更新从持有状态的操作员流式传输到需要该状态的其他操作员。使用side outputs,您可以创建复杂的数据流而无需共享状态。

答案 2 :(得分:0)

为什么您不能将状态作为地图操作的一部分返回,而该流可用于连接到其他流