诊断Dataproc中的错误创建集群操作(Java库)

时间:2016-12-13 01:09:08

标签: java google-cloud-dataproc

尝试使用Google Dataproc创建群集时,结果似乎最初会成功返回,但随后会出现" get"群集通知我群集立即从"创建"到"错误"州。不幸的是,尝试调用Diagnostics调用似乎没有帮助。

这就是我正在做的事情(已经采取了一些自由来呈现带有硬编码字符串的代码,而不是通过api或通过配置属性获得的值):

String projectId = "wide-isotope-147019";
String region = "us-central1-f"
GceClusterConfig computeEngineConfig = new GceClusterConfig();
computeEngineConfig.setZoneUri(
    String.format(ZONE_URI_FORMAT, config.getProjectid(),
                  config.getRegion())
List<String> tagList = new ArrayList<>();
tagList.add("ClusterName: mrfoo");
computeEngineConfig.setTags(tagList);

String machineType = String.format(MACHINE_TYPE_URI_FORMAT,
    projectId, region, "n1-standard-1");
InstanceGroupConfig masterConfig = new InstanceGroupConfig();
masterConfig.setMachineTypeUri(machineType)
            .setNumInstances(1);
InstanceGroupConfig workerConfig = new InstanceGroupConfig();
workerConfig.setMachineTypeUri(machineType)
            .setNumInstances(1);
ClusterConfig clusterConfig = new ClusterConfig();
clusterConfig.setMasterConfig(masterConfig);
clusterConfig.setWorkerConfig(workerConfig);
List<NodeInitializationAction> installActions = new ArrayList<>();
// no init actions yet. want to get basics working first.
clusterConfig.setInitializationActions(installActions);
Cluster cluster = new Cluster();
cluster.setProjectId();
cluster.setConfig(clusterConfig);
cluster.setClusterName("mrfoo");

Dataproc.Projects.Regions.Clusters.Create createOp = null;
Operation result = null;
try {
    createOp = dataproc.projects().regions().clusters()
                       .create(projectId, "global", cluster);
    createOp.setBearerToken(...);
} catch (IOException ex) {
  // handle ...
}

try {
    result = createOp.execute();
} catch (IOExceptions ex) {
   // handle.
}

return result;

以上产生了一个合理的&#34;结果没有错误。但是,稍后,当我做一个get操作时:

Dataproc.Projects.RegoinsClsuters.Get getOp = null;
Cluster result = null;
try {
    getOp = dataproc.projects().regions().clusters()
           .get("wide-isotope-147019", "global", "mrfoo");
    getOp.setBearerToken(...);
} catch (IOException ioEx) {
  ...
}
try {
   result = getOp.execute();
} catch (IOException ioEx) {
    ...
}

该过程不会产生错误,但它告诉我们群集的状态是:(对于长转储抱歉。请参见最后显示历史记录为创建但当前状态为ERROR)。

{"clusterName":"mrfoo","clusterUuid":"<id string>","config":
    {"configBucket":"dataproc-<idstring>",
     "gceClusterConfig":"projectId":"wide-isotope-147019",
    <lots of stuff deleted>
  "status":{"state":"ERROR",
            "stateStartTime":"2016-12-13T00:27:11.143Z"},
   "statusHistory":[
      {"state":"CREATING",
       "stateStartTime":"2016-12-13T00:27:09.947Z"}]}

1 个答案:

答案 0 :(得分:1)

创建Dataproc集群的一般模式是:

Operaiton op = createCluster(...);
while(!op.getDone()) {
    sleep(10);
    op = getOperation(op.getName());
}

if (op.hasError()) {
   // Display op.getError(); 
}

通过查看日志,在这种特殊情况下,我可以说问题是计算引擎拒绝传递的实例标记,因为它们与Compute Engine的正则表达式不匹配有效标记:{{1} }。我已经提交了一个错误,以便Dataproc可以更快地验证实例标记,并在您尝试创建集群时立即引发错误,而不是在操作上设置错误。