我使用Union运算符组合了两个广播流并进行广播。在过程功能类中,我接收到组合流数据。现在,我想将其拆分并将两个流数据放入单独的MapStateDescriptor中。
public class ApacheFlinkTest {
MapStateDescriptor<String, Either<String, Integer>> COMBINED_STATE_DESCRIPTOR = new MapStateDescriptor<>(
"combined_stream", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(new TypeHint<Either<String, Integer>>(){}));
public static void main(String args[]) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> dataStream = env.fromElements("john", "alice", "david","bob");
//Broadcast Streams
DataStream<String> strings = env.fromElements("one", "two", "three");
DataStream<Integer> ints = env.fromElements(1, 2, 3);
DataStream<Either<String, Integer>> stringsOnTheLeft = strings
.map(new MapFunction<String, Either<String, Integer>>() {
@Override
public Either<String, Integer> map(String s) throws Exception {
return Either.Left(s);
}
});
DataStream<Either<String, Integer>> intsOnTheRight = ints
.map(new MapFunction<Integer, Either<String, Integer>>() {
@Override
public Either<String, Integer> map(Integer i) throws Exception {
return Either.Right(i);
}
});
DataStream<Either<String, Integer>> combinedStream = stringsOnTheLeft.union(intsOnTheRight);
BroadcastStream<Either<String, Integer>> broadCastStream = combinedStream.broadcast(COMBINED_STATE_DESCRIPTOR);
dataStream.keyBy(
.................
......................
).connect(broadCastStream).process(new ProcessDataFunction());
}
}
流程数据:
public class ProcessDataFunction extends KeyedBroadcastProcessFunction<String, String, Either<String, Integer>, Tuple2<String, String>> {
private static final long serialVersionUID = 1L;
int i=0;
private transient MapStateDescriptor<String, String> STRING_STATE_DESCRIPTOR = new MapStateDescriptor<>(
"String_Stream", BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO);
private transient MapStateDescriptor<String, Integer> INTEGER_STATE_DESCRIPTOR = new MapStateDescriptor<>(
"Integer_Stream", BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO);
MapStateDescriptor<String, Either<String, Integer>> COMBINED_STATE_DESCRIPTOR = new MapStateDescriptor<>(
"combined_stream", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(new TypeHint<Either<String, Integer>>(){}));
@Override
public void open(Configuration parameters) throws Exception {
// TODO
}
@Override
public void processBroadcastElement(
Either<String, Integer> broadCastStreamData,
KeyedBroadcastProcessFunction<String, String, Either<String, Integer>, Tuple2<String, String>>.Context context,
Collector<Tuple2<String, String>> arg2) throws Exception {
if(broadCastStreamData.isLeft()){
//This is working
context.getBroadcastState(COMBINED_STATE_DESCRIPTOR).put("str_key1"+i,Either.Left(broadCastStreamData.left()));
//This is not working
context.getBroadcastState(STRING_STATE_DESCRIPTOR).put("str_key1"+i,broadCastStreamData.left());
}
else if(broadCastStreamData.isRight()){
//This is working
context.getBroadcastState(COMBINED_STATE_DESCRIPTOR).put("int_key1"+i,Either.Right(broadCastStreamData.right()));
//This is not working
context.getBroadcastState(INTEGER_STATE_DESCRIPTOR).put("int_key1"+i,broadCastStreamData.right());
}
i++;
}
@Override
public void processElement(
String arg0,
KeyedBroadcastProcessFunction<String, String, Either<String, Integer>, Tuple2<String, String>>.ReadOnlyContext arg1,
Collector<Tuple2<String, String>> arg2) throws Exception {
// TODO Auto-generated method stub
}
}
当我跑步时,会发生以下异常:
Caused by: java.lang.IllegalArgumentException: The requested state does not exist. Check for typos in your state descriptor, or specify the state descriptor in the datastream.broadcast(...) call if you forgot to register it.
at org.apache.flink.streaming.api.operators.co.CoBroadcastWithKeyedOperator$ReadWriteContextImpl.getBroadcastState(CoBroadcastWithKeyedOperator.java:189)
at com.flinkkafka.test.ProcessEitherType.processBroadcastElement(ProcessDataFunction.java:57)
at com.flinkkafka.test.ProcessEitherType.processBroadcastElement(ProcessDataFunction.java:1)
at org.apache.flink.streaming.api.operators.co.CoBroadcastWithKeyedOperator.processElement2(CoBroadcastWithKeyedOperator.java:121)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processRecord2(StreamTwoInputProcessor.java:145)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.lambda$new$1(StreamTwoInputProcessor.java:107)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor$$Lambda$776/1415080802.accept(Unknown Source)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessor.java:362)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:151)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:128)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:185)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:311)
at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$710/1653823325.runDefaultAction(Unknown Source)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.lang.Thread.run(Thread.java:745)
我尝试通过STRING_STATE_DESCRIPTOR
用INTEGER_STATE_DESCRIPTOR
方法初始化open()
,getRuntimeContext()
。但不起作用。如何初始化STRING_STATE_DESCRIPTOR
,INTEGER_STATE_DESCRIPTOR
?
我期望的是STRING_STATE_DESCRIPTOR
映射应具有“一个”,“两个”,“三个”和INTEGER_STATE_DESCRIPTOR
应具有1,2,3。
如何实现呢?