我正在尝试使用带有直接运行程序的https://github.com/GoogleCloudPlatform/DataflowTemplates在本地运行PubSubToBigQuery.java。但是我收到错误消息
Exception in thread "main" java.lang.IllegalArgumentException: Class interface com.google.cloud.teleport.templates.PubSubToBigQuery$Options missing a property named 'gcs-location'.
at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1518)
at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:111)
at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:294)
at com.google.cloud.teleport.templates.PubSubToBigQuery.main(PubSubToBigQuery.java:165)
但是我已经在运行过程中通过了--gcs-location=gs://xxx-templates/dataflow/pipelines/pubsub-to-bigquery
。
答案 0 :(得分:1)
您会将传递给Java应用程序的args与传递给通过CLI运行模板化管道的args混淆。
--gcs-location
是您在CLI上传递给gcloud dataflow jobs run
的内容。当您运行Java应用程序时,Dataflow会在GCS(模板)上暂存管道,但不会立即运行管道。 --gcs-location
告诉gcloud dataflow..
要运行的模板的位置。
您不能在本地执行模板化管道。您只需通过Java应用在本地运行模板的登台。
https://cloud.google.com/dataflow/docs/guides/templates/executing-templates
* # Set the runner
* RUNNER=DataflowRunner
*
* # Build the template <--NOTE THIS
* mvn compile exec:java \
* -Dexec.mainClass=com.google.cloud.teleport.templates.PubSubToBigQuery \
* -Dexec.cleanupDaemonThreads=false \
* -Dexec.args=" \
* --project=${PROJECT_ID} \
* --stagingLocation=${PIPELINE_FOLDER}/staging \
* --tempLocation=${PIPELINE_FOLDER}/temp \
* --templateLocation=${PIPELINE_FOLDER}/template \
* --runner=${RUNNER}"
*
* # Execute the template <--NOTE THIS
* JOB_NAME=pubsub-to-bigquery-$USER-`date +"%Y%m%d-%H%M%S%z"`
*
* gcloud dataflow jobs run ${JOB_NAME} \
* --gcs-location=${PIPELINE_FOLDER}/template \
* --zone=us-east1-d \
* --parameters \
* "inputTopic=projects/data-analytics-pocs/topics/teleport-pubsub-to-bigquery,\
* outputTableSpec=data-analytics-pocs:demo.pubsub_to_bigquery,\
* outputDeadletterTable=data-analytics-pocs:demo.pubsub_to_bigquery_deadletter"
* </pre>
*/