我running parallel jobs with expansions在自动缩放的群集上。当Pod的节点仍在运行时,我可以在"Workloads" section of "Kubernetes Engine"中查看Pod。但是,如果集群由于工作量不足而缩小规模,则与删除的节点相关联的Pod将从该视图中消失(以及从通过CLI funct <- function(string) {
return(string %>% stringr::str_replace_all("ef", "HHH"))
print('hi')
}
funct(string)
#[1] "abcdHHHghi"
的访问中消失)。
有什么办法可以防止这些信息消失?了解成功/失败状态并轻松访问日志将非常有用。
答案 0 :(得分:0)
我发现the document与在GKE中运行作业有关,我认为即使删除了节点,您也可以使用kubectl describe job [JobName]
命令inspect the job观察事件,即使是在删除节点之后(由于自动缩放)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 16m job-controller Created pod: [JobName]-4fkr2
Normal SuccessfulCreate 16m job-controller Created pod: [JobName]-fvr9n
Normal SuccessfulCreate 16m job-controller Created pod: [JobName]-jwjgz
Normal SuccessfulCreate 16m job-controller Created pod: [JobName]-ws4t7
Normal SuccessfulCreate 16m job-controller Created pod: [JobName]-jjjdl
另一种选择是,如果您启用了Stackdriver logging即Stackdriver support for GKE,特别是将Stackdriver Kubernetes Engine Monitoring设为Legacy Stackdriver support is deprecating。使用the advance log queries中的以下过滤器[1],您可以检查作业下的吊舱的日志。
[1]
resource.type="container"
resource.labels.cluster_name="[ClusterName]"
resource.labels.namespace_id="[Namespace]"
resource.labels.project_id="[ProjectID]"
resource.labels.zone:"[ZONE]"
resource.labels.container_name="[ContainerName]"
resource.labels.pod_id:"[JobName]-"