无法在本地运行GCP DataflowTemplates

时间:2019-01-22 09:32:15

标签: java google-bigquery apache-beam google-cloud-pubsub dataflow

我正在尝试使用带有直接运行程序的https://github.com/GoogleCloudPlatform/DataflowTemplates在本地运行PubSubToBigQuery.java。但是我收到错误消息

Exception in thread "main" java.lang.IllegalArgumentException: Class interface com.google.cloud.teleport.templates.PubSubToBigQuery$Options missing a property named 'gcs-location'.
    at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1518)
    at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:111)
    at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:294)
    at com.google.cloud.teleport.templates.PubSubToBigQuery.main(PubSubToBigQuery.java:165)

但是我已经在运行过程中通过了--gcs-location=gs://xxx-templates/dataflow/pipelines/pubsub-to-bigquery

在这一行会引发错误。 https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java#L176

https://beam.apache.org/documentation/runners/direct/

1 个答案:

答案 0 :(得分:1)

您会将传递给Java应用程序的args与传递给通过CLI运行模板化管道的args混淆。

--gcs-location是您在CLI上传递给gcloud dataflow jobs run的内容。当您运行Java应用程序时,Dataflow会在GCS(模板)上暂存管道,但不会立即运行管道。 --gcs-location告诉gcloud dataflow..要运行的模板的位置。

您不能在本地执行模板化管道。您只需通过Java应用在本地运行模板的登台。

https://cloud.google.com/dataflow/docs/guides/templates/executing-templates

 * # Set the runner
 * RUNNER=DataflowRunner
 *
 * # Build the template <--NOTE THIS
 * mvn compile exec:java \
 * -Dexec.mainClass=com.google.cloud.teleport.templates.PubSubToBigQuery \
 * -Dexec.cleanupDaemonThreads=false \
 * -Dexec.args=" \
 * --project=${PROJECT_ID} \
 * --stagingLocation=${PIPELINE_FOLDER}/staging \
 * --tempLocation=${PIPELINE_FOLDER}/temp \
 * --templateLocation=${PIPELINE_FOLDER}/template \
 * --runner=${RUNNER}"
 *
 * # Execute the template <--NOTE THIS
 * JOB_NAME=pubsub-to-bigquery-$USER-`date +"%Y%m%d-%H%M%S%z"`
 *
 * gcloud dataflow jobs run ${JOB_NAME} \
 * --gcs-location=${PIPELINE_FOLDER}/template \
 * --zone=us-east1-d \
 * --parameters \
 * "inputTopic=projects/data-analytics-pocs/topics/teleport-pubsub-to-bigquery,\
 * outputTableSpec=data-analytics-pocs:demo.pubsub_to_bigquery,\
 * outputDeadletterTable=data-analytics-pocs:demo.pubsub_to_bigquery_deadletter"
 * </pre>
 */