我有一个带行分隔json的发布/订阅。每个发布/订阅邮件都有一个属性值,其中包含要写入的bigquery表名称。
如何获取单个表名的值,并将其传递给新的管道?
可以从DoFn本身内部创建一个新的PCollection并应用它吗?
答案 0 :(得分:2)
您可以应用转换来检索DoFn
中的表名,并将KV
对<tableName, record>
传递到下游。然后使用BigQueryIO
中的动态目标支持将每条记录路由到正确的目标。另外,您也可以在BigQuery.withFormatFunction()
中检索table属性。下面是执行此操作的示例。
这是总体管道结构,其中从Pub / Sub消费JSON消息,然后根据Pub / Sub消息属性将其路由到适当的表目标。同样,您可以更改getTableDestination(..)
逻辑以从JSON记录中检索表名称。
您可以查看整个示例here。
/**
* Runs the pipeline to completion with the specified options. This method does not wait until the
* pipeline is finished before returning. Invoke {@code result.waitUntilFinish()} on the result
* object to block until the pipeline is finished running if blocking programmatic execution is
* required.
*
* @param options The execution options.
* @return The pipeline result.
*/
public static PipelineResult run(Options options) {
// Create the pipeline
Pipeline pipeline = Pipeline.create(options);
// Retrieve non-serializable parameters
String tableNameAttr = options.getTableNameAttr();
String outputTableProject = options.getOutputTableProject();
String outputTableDataset = options.getOutputTableDataset();
// Build & execute pipeline
pipeline
.apply(
"ReadMessages",
PubsubIO.readMessagesWithAttributes().fromSubscription(options.getSubscription()))
.apply(
"WriteToBigQuery",
BigQueryIO.<PubsubMessage>write()
.to(
input ->
getTableDestination(
input,
tableNameAttr,
outputTableProject,
outputTableDataset))
.withFormatFunction(
(PubsubMessage msg) -> convertJsonToTableRow(new String(msg.getPayload())))
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND));
return pipeline.run();
}
/**
* Retrieves the {@link TableDestination} for the {@link PubsubMessage} by extracting and
* formatting the value of the {@code tableNameAttr} attribute. If the message is null, a {@link
* RuntimeException} will be thrown because the message is unable to be routed.
*
* @param value The message to extract the table name from.
* @param tableNameAttr The name of the attribute within the message which contains the table
* name.
* @param outputProject The project which the table resides.
* @param outputDataset The dataset which the table resides.
* @return The destination to route the input message to.
*/
@VisibleForTesting
static TableDestination getTableDestination(
ValueInSingleWindow<PubsubMessage> value,
String tableNameAttr,
String outputProject,
String outputDataset) {
PubsubMessage message = value.getValue();
TableDestination destination;
if (message != null) {
destination =
new TableDestination(
String.format(
"%s:%s.%s",
outputProject, outputDataset, message.getAttributeMap().get(tableNameAttr)),
null);
} else {
throw new RuntimeException(
"Cannot retrieve the dynamic table destination of an null message!");
}
return destination;
}