执行beamSql查询时,我遇到错误:
Inputs to Flatten had incompatible triggers: Repeatedly.forever(AfterWatermark.pastEndOfWindow()), DefaultTrigger
我的代码是:
PCollectionTuple query6 = PCollectionTuple.of(
new TupleTag<BeamRecord>("record8"), record8).and(new TupleTag<BeamRecord>("record3"), record3);
PCollection<BeamRecord> record9 = query6.apply(
BeamSql.queryMulti("SELECT a.Vehicle, a.Campaign, a.Copy, a.Event, a.CatLib, a.SubBrand, a.BrandName, a.Channel, a.Period1, a.PeriodStartDate, a.PeriodEndDate, SUM(b.CausalValue) as GRPs FROM record8 a LEFT JOIN record3 b ON a.Channel = b.Outlet AND a.SubBrand = b.SubBrand AND a.Event = b.SalesComponent AND b.Week >= a.PeriodStartDate AND b.Week <= a.PeriodEndDate GROUP BY a.Vehicle,a.Campaign,a.Copy,a.Event,a.CatLib,a.SubBrand,a.BrandName,a.Channel,a.Period1,a.PeriodStartDate,a.PeriodEndDate"));
在文档Apache beam DSL第3.1节中,写了
Beam SQL支持在global_window,fixed_window,sliding_window和session_window中使用group_by进行聚合功能。需要使用TIMESTAMP类型的字段来指定fixed_window / sliding_window / session_window。
RECORD3:
PCollectionTuple query3 = PCollectionTuple.of(
new TupleTag<BeamRecord>("record2"), record2).and(new TupleTag<BeamRecord>("apps3"), apps3);
PCollection<BeamRecord> record3 = query3.apply(
BeamSql.queryMulti("SELECT a.Outlet, a.CatLib, a.ProdKey, a.Week, a.SalesComponent, a.DuetoValue, a.PrimaryCausalKey, a.CausalValue, a.ModelIteration, a.Published, a.CatLibKey, a.SubBrand, a.BrandName, a.FinancialYear, b.EventDescription, b.Vehicle FROM record2 a LEFT JOIN apps3 b ON a.SalesComponent = b.EventList WHERE b.Vehicle IS NOT NULL"));
Record8
PCollectionTuple query5 = PCollectionTuple.of(
new TupleTag<BeamRecord>("record6"), record6).and(new TupleTag<BeamRecord>("record7"), record7);
PCollection<BeamRecord> record8 = query5.apply(BeamSql.queryMulti("SELECT a.Vehicle, a.Campaign, a.Copy, a.Event, a.CatLib, a.SubBrand, a.BrandName, a.Channel, a.Period1, a.PeriodStartDate , b.StartDate as PeriodEndDate FROM record6 a LEFT JOIN record7 b ON a.Period1 = b.FinancialYear"));
答案 0 :(得分:1)
您拥有的PCollections
(record8
和record3
)具有不兼容的触发器。一个是Repeatedly.forever
,而另一个是DefaultTrigger
。这意味着Beam无法推断哪些记录可用于哪个触发器触发,因此无法产生有意义的JOIN结果。
解决此问题的一种方法是将PCollections
设置为Repeatedly.forever(...)
触发器,请参阅光束指南中的Setting the triggers部分。
当您将两个输入设置为相同的触发器时,Beam SQL将在每个触发器触发时执行一次连接和groupbys。
当您使用Repeatedly.forever(...)
子句时,您正在使用的Beam版本可能强制设置GROUP BY
。在即将发布的Beam版本中,此行为已更改,GROUP BY
子句将不再覆盖触发器,但它将要求两个无界输入都具有DefaultTrigger
。