我试图在python中以编程方式创建一个集群:
import googleapiclient.discovery
dataproc = googleapiclient.discovery.build('dataproc', 'v1')
zone_uri ='https://www.googleapis.com/compute/v1/projects/{project_id}/zone/{zone}'.format(
project_id=my_project_id,
zone=my_zone,
)
cluster_data = {
'projectId': my_project_id,
'clusterName': my_cluster_name,
'config': {
'gceClusterConfig': {
'zoneUri': zone_uri
},
'softwareConfig' : {
'properties' : {'string' : {'spark:spark.executor.memory' : '10gb'}},
},
},
}
result = dataproc \
.projects() \
.regions() \
.clusters() \
.create(
projectId=my_project_id,
region=my_region,
body=cluster_data,
) \
.execute()
我一直收到此错误:Invalid JSON payload received. Unknown name "spark:spark.executor.memory" at 'cluster.config.software_config.properties[0].value': Cannot find field.">
API的文档位于:https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#SoftwareConfig
属性键以prefix:property格式指定,例如core:fs.defaultFS。
即使我将properties
更改为{'string' : {'core:fs.defaultFS' : 'hdfs://'}}
,我也会遇到同样的错误。
答案 0 :(得分:4)
属性是键/值映射:
'properties': {
'spark:spark.executor.memory': 'foo'
}
文档本来可以有一个更好的例子。通常,找出API外观的最佳方法是在云控制台中单击“等效REST”,或在使用gcloud时单击--log-http
。例如:
$ gcloud dataproc clusters create clustername --properties spark:spark.executor.memory=foo --log-http
=======================
==== request start ====
uri: https://dataproc.googleapis.com/v1/projects/projectid/regions/global/clusters?alt=json
method: POST
== body start ==
{"clusterName": "clustername", "config": {"gceClusterConfig": {"internalIpOnly": false, "zoneUri": "us-east1-d"}, "masterConfig": {"diskConfig": {}}, "softwareConfig": {"properties": {"spark:spark.executor.memory": "foo"}}, "workerConfig": {"diskConfig": {}}}, "projectId": "projectid"}
== body end ==
==== request end ====