有没有办法从CLI / API获取传递给GCP Dataflow作业的参数

时间:2017-06-20 20:22:18

标签: google-cloud-platform google-cloud-dataflow apache-beam

我已经尝试了列出heredescribe命令,但我没有看到参数。我是否应该使用另一个命令来获取此信息,或者提供其他API?

1 个答案:

答案 0 :(得分:1)

TL; DR - 您错过了gcloud dataflow jobs describe命令的--full参数。

  

<强> FLAGS

     

--full

     

检索完整的Job而不是摘要视图

查看完整的职位信息

如果您使用gcloud查看有关GCP数据流作业的信息,此命令将显示有关作业的完整信息(实际上是很多信息),包括传递给的任何参数工作:

gcloud dataflow jobs describe JOB_ID --full

所有选项都在层次结构environment.sdkPipelineOptions.options

将所有选项视为JSON

要查看传递给作业的所有选项(实际打印的不仅仅是命令行参数BTW)作为JSON,您可以执行以下操作:

$ gcloud dataflow jobs describe JOB_ID --full --format='json(environment.sdkPipelineOptions.options)'
{
  "environment": {
    "sdkPipelineOptions": {
      "options": {
        "apiRootUrl": "https://dataflow.googleapis.com/",
        "appName": "WordCount",
        "credentialFactoryClass": "com.google.cloud.dataflow.sdk.util.GcpCredentialFactory",
        "dataflowEndpoint": "",
        "enableCloudDebugger": false,
        "enableProfilingAgent": false,
        "firstArg": "foo",
        "inputFile": "gs://dataflow-samples/shakespeare/kinglear.txt",
        "jobName": "wordcount-tuxdude-12345678",
        "numberOfWorkerHarnessThreads": 0,
        "output": "gs://BUCKET_NAME/dataflow/output",
        "pathValidatorClass": "com.google.cloud.dataflow.sdk.util.DataflowPathValidator",
        "project": "PROJECT_NAME",
        "runner": "com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner",
        "secondArg": "bar",
        "stableUniqueNames": "WARNING",
        "stagerClass": "com.google.cloud.dataflow.sdk.util.GcsStager",
        "stagingLocation": "gs://BUCKET_NAME/dataflow/staging/",
        "streaming": false,
        "tempLocation": "gs://BUCKET_NAME/dataflow/staging/"
      }
    }
  }
}

以表格形式查看所有选项

$ gcloud dataflow jobs describe JOB_ID --full --format='flattened(environment.sdkPipelineOptions.options)'
environment.sdkPipelineOptions.options.apiRootUrl:                   https://dataflow.googleapis.com/
environment.sdkPipelineOptions.options.appName:                      WordCount
environment.sdkPipelineOptions.options.credentialFactoryClass:       com.google.cloud.dataflow.sdk.util.GcpCredentialFactory
environment.sdkPipelineOptions.options.dataflowEndpoint:
environment.sdkPipelineOptions.options.enableCloudDebugger:          False
environment.sdkPipelineOptions.options.enableProfilingAgent:         False
environment.sdkPipelineOptions.options.firstArg:                     foo
environment.sdkPipelineOptions.options.inputFile:                    gs://dataflow-samples/shakespeare/kinglear.txt
environment.sdkPipelineOptions.options.jobName:                      wordcount-tuxdude-12345678
environment.sdkPipelineOptions.options.numberOfWorkerHarnessThreads: 0
environment.sdkPipelineOptions.options.output:                       gs://BUCKET_NAME/dataflow/output
environment.sdkPipelineOptions.options.pathValidatorClass:           com.google.cloud.dataflow.sdk.util.DataflowPathValidator
environment.sdkPipelineOptions.options.project:                      PROJECT_NAME
environment.sdkPipelineOptions.options.runner:                       com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner
environment.sdkPipelineOptions.options.secondArg:                    bar
environment.sdkPipelineOptions.options.stableUniqueNames:            WARNING
environment.sdkPipelineOptions.options.stagerClass:                  com.google.cloud.dataflow.sdk.util.GcsStager
environment.sdkPipelineOptions.options.stagingLocation:              gs://BUCKET_NAME/dataflow/staging/
environment.sdkPipelineOptions.options.streaming:                    False
environment.sdkPipelineOptions.options.tempLocation:                 gs://BUCKET_NAME/dataflow/staging/

获取单个选项的值

要获取名为--argName(其值BTW为MY_ARG_VALUE)的单个选项的值,您可以执行以下操作:

$ gcloud dataflow jobs describe JOB_ID --full --format='value(environment.sdkPipelineOptions.options.argName)'
MY_ARG_VALUE

gcloud格式

gcloud通常支持输出中的各种格式选项,这些选项适用于从服务器提取信息的大多数gcloud命令。你可以阅读它们here