我是Apache Flink的新手,我正在尝试使用flink CEP对流中的模式进行动态评估。我正在尝试查找执行以下操作的用户登录,添加购物车和注销,并且能够检测到该模式,但是如果我要定义多个模式(如登录),则注销其无法检测到该模式
下面是我的代码
动作类
public class Action {
public int userID;
public String action;
public Action() {
}
public Action(int userID, String action) {
this.userID = userID;
this.action = action;
}
public int getUserID() {
return userID;
}
public void setUserID(int userID) {
this.userID = userID;
}
public String getAction() {
return action;
}
public void setAction(String action) {
this.action = action;
}
@Override
public String toString() {
return "Action [userID=" + userID + ", action=" + action + "]";
}
}
模式类
public class Pattern {
public String firstAction;
public String secondAction;
public String thirdAction;
public Pattern() {
}
public Pattern(String firstAction, String secondAction) {
this.firstAction = firstAction;
this.secondAction = secondAction;
}
public Pattern(String firstAction, String secondAction, String thirdAction) {
this.firstAction = firstAction;
this.secondAction = secondAction;
this.thirdAction = thirdAction;
}
public String getFirstAction() {
return firstAction;
}
public void setFirstAction(String firstAction) {
this.firstAction = firstAction;
}
public String getSecondAction() {
return secondAction;
}
public void setSecondAction(String secondAction) {
this.secondAction = secondAction;
}
public String getThirdAction() {
return thirdAction;
}
public void setThirdAction(String thirdAction) {
this.thirdAction = thirdAction;
}
@Override
public String toString() {
return "Pattern [firstAction=" + firstAction + ", secondAction=" + secondAction + ", thirdAction=" + thirdAction
+ "]";
}
}
主类
public class CEPBroadcast {
public static class PatternEvaluator
extends KeyedBroadcastProcessFunction<Integer, Action, Pattern, Tuple2<Integer, Pattern>> {
/**
*
*/
private static final long serialVersionUID = 1L;
ValueState<String> prevActionState;
MapStateDescriptor<Void, Pattern> patternDesc;
@Override
public void open(Configuration conf) throws IOException {
prevActionState = getRuntimeContext().getState(new ValueStateDescriptor<>("lastAction", Types.STRING));
patternDesc = new MapStateDescriptor<>("patterns", Types.VOID, Types.POJO(Pattern.class));
}
@Override
public void processBroadcastElement(Pattern pattern, Context ctx, Collector<Tuple2<Integer, Pattern>> out)
throws Exception {
BroadcastState<Void, Pattern> bcState = ctx.getBroadcastState(patternDesc);
bcState.put(null, pattern);
;
}
@Override
public void processElement(Action action, ReadOnlyContext ctx, Collector<Tuple2<Integer, Pattern>> out)
throws Exception {
Pattern pattern = ctx.getBroadcastState(this.patternDesc).get(null);
String prevAction = prevActionState.value();
if (pattern != null && prevAction != null) {
if (pattern.firstAction.equals(prevAction) && pattern.secondAction.equals(prevAction)
&& pattern.thirdAction.equals(action.action)) {
out.collect(new Tuple2<>(ctx.getCurrentKey(), pattern));
} else if (pattern.firstAction.equals(prevAction) && pattern.secondAction.equals(action.action)) {
out.collect(new Tuple2<>(ctx.getCurrentKey(), pattern));
}
}
prevActionState.update(action.action);
}
}
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Action> actions = env.fromElements(new Action(1001, "login"), new Action(1002, "login"),
new Action(1003, "login"), new Action(1003, "addtocart"), new Action(1001, "logout"),
new Action(1003, "logout"));
DataStream<Pattern> pattern = env.fromElements(new Pattern("login", "logout"));
KeyedStream<Action, Integer> actionByUser = actions
.keyBy((KeySelector<Action, Integer>) action -> action.userID);
MapStateDescriptor<Void, Pattern> bcStateDescriptor = new MapStateDescriptor<>("patterns", Types.VOID,
Types.POJO(Pattern.class));
BroadcastStream<Pattern> bcedPattern = pattern.broadcast(bcStateDescriptor);
DataStream<Tuple2<Integer, Pattern>> matches = actionByUser.connect(bcedPattern)
.process(new PatternEvaluator());
matches.flatMap(new FlatMapFunction<Tuple2<Integer, Pattern>, String>() {
private static final long serialVersionUID = 1L;
@Override
public void flatMap(Tuple2<Integer, Pattern> value, Collector<String> out) throws Exception {
if (value.f1.thirdAction != null) {
out.collect("User ID: " + value.f0 + ",Pattern matched:" + value.f1.firstAction + ","
+ value.f1.secondAction + "," + value.f1.thirdAction);
} else {
out.collect("User ID: " + value.f0 + ",Pattern matched:" + value.f1.firstAction + ","
+ value.f1.secondAction);
}
}
}).print();
env.execute("CEPBroadcast");
}
}
如果我给一个模式以评估其模式,则输出如下所示
DataStream<Action> actions = env.fromElements(new Action(1001, "login"), new Action(1002, "login"),
new Action(1003, "login"), new Action(1003, "addtocart"), new Action(1001, "logout"),
new Action(1003, "logout"));
DataStream<Pattern> pattern = env.fromElements(new Pattern("login", "logout"));
Output: User ID: 1001,Pattern matched:login,logout
如果我想给多个模式进行如下所示的评估,则其未评估第二个模式建议我如何评估多个模式,
DataStream<Pattern> pattern = env.fromElements(new Pattern ("login","addtocart","logout"),
new Pattern("login", "logout"));
Output: User ID: 1003,Pattern matched:login,addtocart,logout
答案 0 :(得分:1)
不起作用的原因有两个:
(1)每当您拥有带有多个输入流的Flink运算符时,例如应用程序中的PatternEvaluator
,就无法控制该运算符如何从其输入中读取内容。在您的情况下,它可能在读取模式之前完全消耗了Action流中的事件,反之亦然,或者可能交错了这两个流。从某种意义上说,您很幸运,它可以与任何东西匹配。
解决这个问题并不容易。如果您在编译时了解所有模式(换句话说,如果它们实际上不是动态的),则可以使用Flink CEP或Flink SQL中的MATCH_RECOGNIZE。
如果您确实需要动态模式,则必须找到一种方法来阻止操作流,直到读取模式为止。 SO的其他问题之前已经涵盖了该主题(“辅助输入”)。例如,请参见How to unit test BroadcastProcessFunction in flink when processElement depends on broadcasted data。 (或者您可以调整期望值,并确保只有在存储模式之后才处理的操作才能与该模式匹配。)
(2)通过存储模式时使用null作为键
bcState.put(null, pattern);
当第二个图案到达时,您将用第二个图案覆盖第一个图案。两种模式都无法匹配。
要将输入与两种不同的模式进行匹配,您需要修改PatternEvaluator
以处理两种模式的同时匹配。这将需要将两种模式都存储在广播状态中,同时考虑两个模式都在processElement
中,并且两个模式都具有prevActionState
的实例。您可能要提供模式ID,在广播状态下将这些ID用作键,并为prevActionState
使用MapState,并再次由模式ID键控。
更新:
请记住,当您使用DataStream API编写流作业时,并没有像在典型的过程应用程序中那样定义执行顺序。相反,您将描述数据流图的拓扑结构以及该图中嵌入的运算符的行为,该运算符将执行作业(将并行执行)。