Question

我的应用程序在GAE中运行。此应用程序使REST调用我的CloudML。

以下是

的代码

        GoogleCredential credential = GoogleCredential.getApplicationDefault()
                .createScoped(Collections.singleton(CLOUDML_SCOPE));
        HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
        HttpRequestInitializer requestInitializer = request -> {
            credential.initialize(request);
            request.setReadTimeout(0);
        };

        HttpRequestFactory requestFactory = httpTransport.createRequestFactory(
                requestInitializer);

        GenericUrl url = new GenericUrl(predictRestUrl);

        JacksonFactory jacksonFactory = new JacksonFactory();
        JsonHttpContent jsonHttpContent = new JsonHttpContent(jacksonFactory, getPayLoad());

        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        jsonHttpContent.setWrapperKey("instances");
        jsonHttpContent.writeTo(baos);
        LOG.info("Executing request... " + baos.toString());
        HttpRequest request = requestFactory.buildPostRequest(url, jsonHttpContent);

        HttpResponse response = request.execute();

我将ReadTimeOut设置为0，因为我经常会遇到读取超时异常。

现在使用此代码我经常从CloudML

获得以下错误响应

com.google.api.client.http.HttpResponseException: 500 Internal Server Error
{
  "error": {
    "code": 500,
    "message": "Internal error encountered.",
    "errors": [
      {
        "message": "Internal error encountered.",
        "domain": "global",
        "reason": "backendError"
      }
    ],
    "status": "INTERNAL"
  }
}

我们可以在哪里获取REST调用CloudML的日志？如何进一步调试？

Answer 1

我们与@sag合作并确定500错误是由于长时间“冷启动”导致超时的结果。如果你有一段时间没有向你的模型发送流量，或者你发送的足够多，我们需要旋转更多的实例，你将会遇到一个“冷启动”，其中一个或多个实例被旋转。目前，这可能是一个漫长的过程，有时会在我们结束时超时，并可能导致500错误。

可以安全地重试这些错误;我们建议使用指数退避。

通过REST调用CloudML Predict时遇到内部错误

1 个答案: