无法使用dataproc API创建具有属性的集群

时间:2017-08-19 01:40:27

标签: python google-cloud-dataproc

我试图在python中以编程方式创建一个集群:

import googleapiclient.discovery

dataproc = googleapiclient.discovery.build('dataproc', 'v1')
zone_uri ='https://www.googleapis.com/compute/v1/projects/{project_id}/zone/{zone}'.format(
  project_id=my_project_id,
  zone=my_zone,
  )
cluster_data = {
  'projectId': my_project_id,
  'clusterName': my_cluster_name,
  'config': {
    'gceClusterConfig': {
      'zoneUri': zone_uri
    },
    'softwareConfig' : {
      'properties' : {'string' : {'spark:spark.executor.memory' : '10gb'}},
    },
  },
}
result = dataproc \
  .projects() \
  .regions() \
  .clusters() \
  .create(
    projectId=my_project_id,
    region=my_region,
    body=cluster_data,
    ) \
  .execute()

我一直收到此错误:Invalid JSON payload received. Unknown name "spark:spark.executor.memory" at 'cluster.config.software_config.properties[0].value': Cannot find field.">

API的文档位于:https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#SoftwareConfig

  

属性键以prefix:property格式指定,例如core:fs.defaultFS。

即使我将properties更改为{'string' : {'core:fs.defaultFS' : 'hdfs://'}},我也会遇到同样的错误。

1 个答案:

答案 0 :(得分:4)

属性是键/值映射:

'properties': {
  'spark:spark.executor.memory': 'foo'
}

文档本来可以有一个更好的例子。通常,找出API外观的最佳方法是在云控制台中单击“等效REST”,或在使用gcloud时单击--log-http。例如:

$ gcloud dataproc clusters create clustername --properties spark:spark.executor.memory=foo --log-http
=======================
==== request start ====
uri: https://dataproc.googleapis.com/v1/projects/projectid/regions/global/clusters?alt=json
method: POST
== body start ==
{"clusterName": "clustername", "config": {"gceClusterConfig": {"internalIpOnly": false, "zoneUri": "us-east1-d"}, "masterConfig": {"diskConfig": {}}, "softwareConfig": {"properties": {"spark:spark.executor.memory": "foo"}}, "workerConfig": {"diskConfig": {}}}, "projectId": "projectid"}
== body end ==
==== request end ====