Beam:与SideInput一起使用时没有事件,并与DataflowRunner一起进入流管道

时间:2019-07-18 23:07:25

标签: google-cloud-dataflow apache-beam dataflow

我已经使用DirectRunner和DataflowRunner使用以下代码测试了流输入中的sideinput:

public class Testsideinput {
  private static final Logger LOG = LoggerFactory.getLogger(Testsideinput.class);

  static class RefreshCache extends DoFn<Long, String> {
    private static final long serialVersionUID = 1;
    private static final Random RANDOM = new Random();

    @ProcessElement
    public void processElement(ProcessContext c) {
      c.output("A"+c.element());
      c.output("B"+c.element());
      c.output("C"+c.element());
      c.output("D"+c.element());
      c.output("E"+c.element());
      c.output("F"+c.element());
    }
  }

  public static void main(String[] args) {
    PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
    Pipeline pipeline = Pipeline.create(options);

    final PCollectionView<List<String>> sideInput2 =
      pipeline.apply("TextIO", TextIO.read().from("<Put your gs://>))
              .apply("viewTags", View.asList());

    final PCollectionView<List<String>> sideInput =
      pipeline.apply("GenerateSequence",
                     GenerateSequence
                        .from(0)
                        .withRate(1, Duration.standardSeconds(1)))
              .apply("Window GenerateSequence",
                     Window.into(FixedWindows.of(Duration.standardSeconds(5))))
              .apply("Counts", Combine.globally(Sum.ofLongs()).withoutDefaults())
              .apply("RefreshCache", ParDo.of(new RefreshCache()))
              .apply("viewTags", View.asList());

    final PubsubIO.Read<PubsubMessage> pubsubRead = 
      PubsubIO.readMessages()
              .withIdAttribute("id")
              .withTimestampAttribute("ts")
              .fromTopic("<put your topic>");

    // PCollection<KV<String,Long>> taxi =;
    PCollection<String> taxi = 
      pipeline.apply("Read from", pubsubRead)
              .apply("Window Fixed",
                  Window.into(FixedWindows.of(Duration.standardSeconds(15))))
                .apply(MapElements.via(new PubSubToTableRow()))
                .apply("key rides by rideid",
                  MapElements
                    .into(TypeDescriptors
                            .kvs(TypeDescriptors.strings(),
                                 TypeDescriptor.of(TableRow.class)))
                    .via(ride -> KV.of(ride.get("ride_id").toString(), ride)))
                .apply("Count Per Element", Count.perKey())
                .apply(
                  ParDo.of(new DoFn<KV<String,Long>, String>() {

                    @ProcessElement
                    public void processElement(
                      @Element KV<String,Long> value,
                      OutputReceiver<String> out, ProcessContext c) {

                        // In our DoFn, access the side input.
                        List<String> sideinput = c.sideInput(sideInput);
                        List<String> sideinput2 = c.sideInput(sideInput2);

                        LOG.info("sideinput" + sideinput.toString());
                        LOG.info("sideinput2 " + sideinput2.toString());
                        LOG.info("value " + value);
                        out.output("test");
                    }
                  }).withSideInputs(sideInput,sideInput2));
    pipeline.run();
}

我在DirectRunner上具有边输入(列表和映射)的所有值,但在DataflowRunner中没有任何价值(View.CreatePCollectionView/ParDo(StreamingPCollectionViewWriter)步骤没有输出)

enter image description here

enter image description here

您有解决此问题的想法吗?

0 个答案:

没有答案