是否可以将avro文件写入动态创建的GCS存储桶(基于tenantID)?

时间:2020-05-19 23:25:28

标签: java google-cloud-dataflow apache-beam

基本上我想做的是,基于tenantID创建GCS存储桶(作为事件的一部分),并使用FileIO.writeDynamic在Google数据流作业中使用动态文件命名来编写这些事件。

我面临的问题是

srcEvents.apply("Window", Window
                        .<MyEvent>into(FixedWindows.of(Duration.standardSeconds(60))))
                        .apply("WriteAvro", FileIO.<MyEventDestination, MyEvent>writeDynamic()
                                        .by(groupFn).via(outputFn, sinkFn)
                                        **.to()** // what to pass as here as i want it to be based on event.getTenantId (gs://test-123)
                                        .withDestinationCoder(destinationCoder)
                                        .withNumShards(100).withNaming(namingFn));

我通过调用PTranform of srcEvents

在上述方法之前创建gcs存储桶

1 个答案:

答案 0 :(得分:0)

我能够使用withTempDirectory选项来解决此问题,其中我提供了temp gcs存储桶路径,并使用文件命名来为每个域构建动态存储桶路径

srcEvents.apply("Window", Window .<MyEvent>into(FixedWindows.of(Duration.standardSeconds(60)))) .apply("WriteAvro", FileIO.<MyEventDestination, MyEvent>writeDynamic() .by(groupFn).via(outputFn, sinkFn) .withTempDirectory("gs://temp-blah/") .withDestinationCoder(destinationCoder) .withNumShards(100).withNaming(namingFn)); namingFn to build filename gs://domain-123/2020-05-01/event.avro