群集缩小后无法看到GKE Pod

时间:2020-01-05 04:33:09

标签: google-kubernetes-engine

running parallel jobs with expansions在自动缩放的群集上。当Pod的节点仍在运行时,我可以在"Workloads" section of "Kubernetes Engine"中查看Pod。但是,如果集群由于工作量不足而缩小规模,则与删除的节点相关联的Pod将从该视图中消失(以及从通过CLI funct <- function(string) { return(string %>% stringr::str_replace_all("ef", "HHH")) print('hi') } funct(string) #[1] "abcdHHHghi" 的访问中消失)。

有什么办法可以防止这些信息消失?了解成功/失败状态并轻松访问日志将非常有用。

1 个答案:

答案 0 :(得分:0)

我发现the document与在GKE中运行作业有关,我认为即使删除了节点,您也可以使用kubectl describe job [JobName]命令inspect the job观察事件,即使是在删除节点之后(由于自动缩放)

Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-4fkr2
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-fvr9n
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-jwjgz
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-ws4t7
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-jjjdl

另一种选择是,如果您启用了Stackdriver loggingStackdriver support for GKE,特别是将Stackdriver Kubernetes Engine Monitoring设为Legacy Stackdriver support is deprecating。使用the advance log queries中的以下过滤器[1],您可以检查作业下的吊舱的日志。

[1]

resource.type="container"
resource.labels.cluster_name="[ClusterName]"
resource.labels.namespace_id="[Namespace]"
resource.labels.project_id="[ProjectID]"
resource.labels.zone:"[ZONE]"
resource.labels.container_name="[ContainerName]"
resource.labels.pod_id:"[JobName]-"