我正在做一个Kubernetes cron工作,它代表一个集成测试; Go测试二进制文件是用go test -c
编译并复制到cron作业运行的Docker容器中的。 Kubernetes YAML的启动类似于以下内容:
apiVersion: batch/v1beta1
kind: CronJob
spec:
schedule: "*/15 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 7
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
在某个时候,集成测试开始失败(以代码1退出)。我可以看到该工作的持续时间与其年龄相同:
$ kubectl get jobs -l app=integration-test
NAME COMPLETIONS DURATION AGE
integration-test-1592457300 0/1 7m20s 7m20s
kubectl get pods
命令显示,像我从cron计划中所期望的那样,豆荚的创建频率比每15分钟更频繁:
$ kubectl get pods -l app=integration-test
NAME READY STATUS RESTARTS AGE
integration-test-1592457300-224x8 0/1 Error 0 92s
integration-test-1592457300-5f8sz 0/1 Error 0 7m33s
integration-test-1592457300-9zvjq 0/1 Error 0 3m57s
integration-test-1592457300-th7sf 0/1 Error 0 6m26s
integration-test-1592457300-vhbr2 0/1 Error 0 5m17s
这种分解新Pod的行为是有问题的,因为它会导致节点上运行的Pod数量增加-本质上,它会消耗资源。
如何使cron作业不会继续旋转新的pod,而只能每15分钟执行一次,并且在作业失败时不会继续消耗资源?
使用https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/改编的Kubernetes YAML作为简化示例:
$ cat cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster; exit 1
restartPolicy: Never
请注意,它以代码1退出。如果我使用kubernetes apply -f cronjob.yaml
运行此代码,然后检查Pod,我会看到
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1592459760-fnvcw 0/1 Error 0 30s
hello-1592459760-w75lt 0/1 Error 0 31s
hello-1592459760-xzhwn 0/1 Error 0 20s
豆荚的年龄相距不到一分钟;换句话说,豆荚在cron间隔过去之前就旋转了。我该如何预防?
答案 0 :(得分:3)
这是非常特定的情况,很难猜测您想要实现什么以及它是否对您有用。
concurrencyPolicy: Forbid阻止创建另一个job
,如果先前不是completed
。但我认为情况并非如此。
restartPolicy适用于pod
(但是在Job template
中,您只能使用OnFailure
和Never
)。如果将restartPolicy
设置为Never
,则job
将自动创建新的pods
,直到完成。
作业会创建一个或多个Pod,并确保已成功终止其中指定数量的Pod。吊舱成功完成后,工作将跟踪成功完成的情况。
如果您设置restartPolicy: Never
,它将一直创建pod,直到达到backoffLimit,但是这些pods
仍将在集群中以每个{pod}的状态显示为Error
以status 1
退出。您需要手动将其删除。
如果您设置restartPolicy: OnFailure
,它将重新启动一个pod
,并且不会创建更多内容。
但是还有另一种方式。什么是completed
工作?
示例:
1。 restartPolicy: OnFailure
$ kubectl get po,jobs,cronjob
NAME READY STATUS RESTARTS AGE
pod/hello-1592495280-w27mt 0/1 CrashLoopBackOff 5 5m21s
pod/hello-1592495340-tzc64 0/1 CrashLoopBackOff 5 4m21s
pod/hello-1592495400-w8cm6 0/1 CrashLoopBackOff 5 3m21s
pod/hello-1592495460-jjlx5 0/1 CrashLoopBackOff 4 2m21s
pod/hello-1592495520-c59tm 0/1 CrashLoopBackOff 3 80s
pod/hello-1592495580-rrdzw 0/1 Error 2 20s
NAME COMPLETIONS DURATION AGE
job.batch/hello-1592495220 0/1 6m22s 6m22s
job.batch/hello-1592495280 0/1 5m22s 5m22s
job.batch/hello-1592495340 0/1 4m22s 4m22s
job.batch/hello-1592495400 0/1 3m22s 3m22s
job.batch/hello-1592495460 0/1 2m22s 2m22s
job.batch/hello-1592495520 0/1 81s 81s
job.batch/hello-1592495580 0/1 21s 21s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/hello */1 * * * * False 6 25s 15m
每个job
将只创建1个pod
,直到job
被finished
或{{1}视为completed
}。
如果您将在CronJob
部分中描述CronJob
,则可以找到。
Event
为什么将作业Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 18m cronjob-controller Created job hello-1592494740
Normal SuccessfulCreate 17m cronjob-controller Created job hello-1592494800
Normal SuccessfulCreate 16m cronjob-controller Created job hello-1592494860
Normal SuccessfulCreate 15m cronjob-controller Created job hello-1592494920
Normal SuccessfulCreate 14m cronjob-controller Created job hello-1592494980
Normal SuccessfulCreate 13m cronjob-controller Created job hello-1592495040
Normal SawCompletedJob 12m cronjob-controller Saw completed job: hello-1592494740
Normal SuccessfulCreate 12m cronjob-controller Created job hello-1592495100
Normal SawCompletedJob 11m cronjob-controller Saw completed job: hello-1592494800
Normal SuccessfulDelete 11m cronjob-controller Deleted job hello-1592494740
Normal SuccessfulCreate 11m cronjob-controller Created job hello-1592495160
Normal SawCompletedJob 10m cronjob-controller Saw completed job: hello-1592494860
视为hello-1592494740
? Completed
的{{1}}默认值为6(此信息可在docs中找到)。如果Cronjob
将失败6次(pod将无法重新启动6次),.spec.backoffLimit
会将此job
视为Cronjob
并将其删除。在job
被删除之后,Completed
也将被删除。
但是,在您的示例中,创建了job
,执行了pod操作日期和回显命令,然后以代码1退出。即使pod
崩溃了,它也会写信息。由于最后一个命令是pod
,因此它将崩溃直到达到限制。按照下面的示例:
pod
2。 exit 1
和$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1592495400-w8cm6 0/1 Terminating 6 5m51s
hello-1592495460-jjlx5 0/1 CrashLoopBackOff 5 4m51s
hello-1592495520-c59tm 0/1 CrashLoopBackOff 5 3m50s
hello-1592495580-rrdzw 0/1 CrashLoopBackOff 4 2m50s
hello-1592495640-nbq59 0/1 CrashLoopBackOff 4 110s
hello-1592495700-p6pcx 0/1 Error 3 50s
user@cloudshell:~ (project)$ kubectl logs hello-1592495520-c59tm
Thu Jun 18 15:55:13 UTC 2020
Hello from the Kubernetes cluster
使用了以下YAML:
restartPolicy: Never
输出
backoffLimit: 0
这样,只有一个apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster; exit 1
restartPolicy: Never
backoffLimit: 0
和一个$ kubectl get po,jobs,cronjob
NAME READY STATUS RESTARTS AGE
pod/hello-1592497320-svd6k 0/1 Error 0 44s
NAME COMPLETIONS DURATION AGE
job.batch/hello-1592497320 0/1 44s 44s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/hello */1 * * * * False 0 51s 11m
$ kubectl describe cronjob
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m cronjob-controller Created job hello-1592496720
Normal SawCompletedJob 11m cronjob-controller Saw completed job: hello-1592496720
Normal SuccessfulCreate 11m cronjob-controller Created job hello-1592496780
Normal SawCompletedJob 10m cronjob-controller Saw completed job: hello-1592496780
Normal SuccessfulDelete 10m cronjob-controller Deleted job hello-1592496720
Normal SuccessfulCreate 10m cronjob-controller Created job hello-1592496840
Normal SuccessfulDelete 9m55s cronjob-controller Deleted job hello-1592496780
Normal SawCompletedJob 9m55s cronjob-controller Saw completed job: hello-1592496840
Normal SuccessfulCreate 9m5s cronjob-controller Created job hello-1592496900
Normal SawCompletedJob 8m55s cronjob-controller Saw completed job: hello-1592496900
Normal SuccessfulDelete 8m55s cronjob-controller Deleted job hello-1592496840
Normal SuccessfulCreate 8m5s cronjob-controller Created job hello-1592496960
Normal SawCompletedJob 7m55s cronjob-controller Saw completed job: hello-1592496960
Normal SuccessfulDelete 7m55s cronjob-controller Deleted job hello-1592496900
Normal SuccessfulCreate 7m4s cronjob-controller Created job hello-1592497020
可以同时运行(当有2个作业和2个容器时,可能会有10秒的间隔)。
job
我希望它可以清除一点。如果您想要更准确的答案,请提供有关您的情况的更多信息。
答案 1 :(得分:0)
默认concurrencyPolicy:允许。
您可以设置concurrencyPolicy: Forbid
以避免并行运行新作业。
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "* * * * *"
# Allow | Forbid | Replace
concurrencyPolicy: Forbid
jobTemplate: