错误:Flatten的输入具有不兼容的触发器:Repeatedly.forever(AfterWatermark.pastEndOfWindow()),DefaultTrigger

时间:2018-03-19 11:45:36

标签: google-cloud-dataflow apache-beam

执行beamSql查询时,我遇到错误:

Inputs to Flatten had incompatible triggers: Repeatedly.forever(AfterWatermark.pastEndOfWindow()), DefaultTrigger

我的代码是:

PCollectionTuple query6 = PCollectionTuple.of(
    new TupleTag<BeamRecord>("record8"), record8).and(new TupleTag<BeamRecord>("record3"), record3);

PCollection<BeamRecord> record9 = query6.apply(
    BeamSql.queryMulti("SELECT a.Vehicle, a.Campaign, a.Copy, a.Event, a.CatLib, a.SubBrand, a.BrandName, a.Channel, a.Period1, a.PeriodStartDate, a.PeriodEndDate, SUM(b.CausalValue) as GRPs FROM record8 a LEFT JOIN record3 b ON a.Channel = b.Outlet AND a.SubBrand = b.SubBrand AND a.Event = b.SalesComponent AND b.Week >= a.PeriodStartDate AND b.Week <= a.PeriodEndDate GROUP BY a.Vehicle,a.Campaign,a.Copy,a.Event,a.CatLib,a.SubBrand,a.BrandName,a.Channel,a.Period1,a.PeriodStartDate,a.PeriodEndDate"));

在文档Apache beam DSL第3.1节中,写了

  

Beam SQL支持在global_window,fixed_window,sliding_window和session_window中使用group_by进行聚合功能。需要使用TIMESTAMP类型的字段来指定fixed_window / sliding_window / session_window。

RECORD3:

PCollectionTuple query3 = PCollectionTuple.of(
            new TupleTag<BeamRecord>("record2"), record2).and(new TupleTag<BeamRecord>("apps3"), apps3);        
    PCollection<BeamRecord> record3 = query3.apply(
    BeamSql.queryMulti("SELECT a.Outlet, a.CatLib, a.ProdKey, a.Week, a.SalesComponent, a.DuetoValue, a.PrimaryCausalKey, a.CausalValue, a.ModelIteration, a.Published,  a.CatLibKey, a.SubBrand, a.BrandName, a.FinancialYear, b.EventDescription, b.Vehicle FROM record2 a LEFT JOIN apps3 b ON a.SalesComponent = b.EventList WHERE b.Vehicle IS NOT NULL"));

Record8

PCollectionTuple query5 = PCollectionTuple.of(
            new TupleTag<BeamRecord>("record6"), record6).and(new TupleTag<BeamRecord>("record7"), record7);        
    PCollection<BeamRecord> record8 = query5.apply(BeamSql.queryMulti("SELECT a.Vehicle, a.Campaign, a.Copy, a.Event, a.CatLib, a.SubBrand, a.BrandName, a.Channel, a.Period1, a.PeriodStartDate , b.StartDate as PeriodEndDate FROM record6 a LEFT JOIN record7 b ON a.Period1 = b.FinancialYear"));
  1. 在我的查询中,我应该实现Repeatedly.forever(...)?
  2. 如何在此查询中执行左连接和分组依据?
  3. 我该如何实现触发器?

1 个答案:

答案 0 :(得分:1)

您拥有的PCollectionsrecord8record3)具有不兼容的触发器。一个是Repeatedly.forever,而另一个是DefaultTrigger。这意味着Beam无法推断哪些记录可用于哪个触发器触发,因此无法产生有意义的JOIN结果。

解决此问题的一种方法是将PCollections设置为Repeatedly.forever(...)触发器,请参阅光束指南中的Setting the triggers部分。

当您将两个输入设置为相同的触发器时,Beam SQL将在每个触发器触发时执行一次连接和groupbys。

当您使用Repeatedly.forever(...)子句时,您正在使用的Beam版本可能强制设置GROUP BY。在即将发布的Beam版本中,此行为已更改,GROUP BY子句将不再覆盖触发器,但它将要求两个无界输入都具有DefaultTrigger