我为DAG创建了以下示例代码以了解聚合。看来slidingWindow
顶点没有发出任何记录。
不确定,这是怎么回事。
public DAG buildDAG() {
DAG dag = new DAG();
SlidingWindowPolicy winPolicy = slidingWinPolicy(SLIDING_WINDOW_LENGTH_MILLIS, SLIDE_STEP_MILLIS);
Vertex source = dag.newVertex("source", SourceProcessors.streamRemoteMapP(getRemoteSourceName(),
getClientConfig(), START_FROM_OLDEST, WatermarkGenerationParams.noWatermarks()));
Vertex slidingWindow = dag.newVertex("aggregate-to-sliding-win",
aggregateToSlidingWindowP(
singletonList((v) -> getUserID((Entry<String, CacheEntry<Record>>)v)),
singletonList((v) -> getTimeStamp((Entry<String, CacheEntry<Record>>)v)),
TimestampKind.EVENT,
winPolicy,
counting(),
TimestampedEntry::new));
Vertex peekOP = dag.newVertex("peekOP", DiagnosticProcessors.writeLoggerP());
Vertex peekOP1 = dag.newVertex("peekOP1", DiagnosticProcessors.writeLoggerP());
Vertex sink = dag.newVertex("sink", SinkProcessors.writeFileP("c:\\\\data\\\\op1.txt"));
return dag
.edge(between(source, peekOP))
.edge(between(peekOP, slidingWindow))
.edge(between(slidingWindow,peekOP1))
.edge(between(peekOP1, sink));
}
类似地,我为管道API创建了以下示例代码进行汇总。
这很好。这样会在文本文件中打印记录。
private Pipeline buildPipeline() {
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<String, CacheEntry<AuditLogRecord>>remoteMapJournal("cache_AuditLog", getClientConfig(), START_FROM_OLDEST))
.addTimestamps((v) -> getTimeStamp(v), 3000)
.peek()
.groupingKey((v) -> Tuple2.tuple2(getUserID(v),getTranType(v)))
.window(WindowDefinition.sliding(SLIDING_WINDOW_LENGTH_MILLIS, SLIDE_STEP_MILLIS))
.aggregate(counting())
.map((v)-> getMapKey(v))
.drainTo(Sinks.files("c:\\data\\op.txt"));
return p;
}
请帮助我更正DAG定义吗?
答案 0 :(得分:2)
存在多个问题:
WatermarkGenerationParams.noWatermarks():要具有窗口处理器的任何输出,您需要水印。使用wmGenParams((v) -> getTimeStamp(v), limitingLag(3000), emitByFrame(winPolicy), -1)
DiagnosticProcessors.writeLoggerP()
是一个接收器。它接收项目,但不发出任何东西。要查看顶点,请将处理器供应商包装在peekInputP( /* original supplier */ )
或peekOutputP
slidingWindow
的边必须为distributed
和partitioned
。没有这些,您将获得结果,但结果不正确。
DAG API用于高级用例,而高级用例是使用Pipeline API无法实现的。随着每个Jet版本的发布,使用DAG API的需求将减少。如您的示例所示,Pipeline API更易于编写和简洁。