我在Kubernetes过夜。当我早上检查它时,它失败了。通常,我会检查pod日志或事件以确定原因。但是,窗格已删除,没有任何事件。
kubectl describe job topics-etl --namespace dnc
这是describe
的输出:
Name: topics-etl
Namespace: dnc
Selector: controller-uid=391cb7e5-b5a0-11e9-a905-0697dd320292
Labels: controller-uid=391cb7e5-b5a0-11e9-a905-0697dd320292
job-name=topics-etl
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"topics-etl","namespace":"dnc"},"spec":{"template":{"spec":{"con...
Parallelism: 1
Completions: 1
Start Time: Fri, 02 Aug 2019 22:38:56 -0500
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=391cb7e5-b5a0-11e9-a905-0697dd320292
job-name=topics-etl
Containers:
docsund-etl:
Image: acarl005/docsund-topics-api:0.1.4
Port: <none>
Host Port: <none>
Command:
./create-topic-data
Requests:
cpu: 1
memory: 1Gi
Environment:
AWS_ACCESS_KEY_ID: <set to the key 'access_key_id' in secret 'aws-secrets'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'secret_access_key' in secret 'aws-secrets'> Optional: false
AWS_S3_CSV_PATH: <set to the key 's3_csv_path' in secret 'aws-secrets'> Optional: false
Mounts:
/app/state from topics-volume (rw)
Volumes:
topics-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: topics-volume-claim
ReadOnly: false
Events: <none>
这是作业配置yaml。它有restartPolicy: OnFailure
,但从未重启。我也没有设置TTL,所以永远不要清理豆荚。
apiVersion: batch/v1
kind: Job
metadata:
name: topics-etl
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: docsund-etl
image: acarl005/docsund-topics-api:0.1.6
command: ["./create-topic-data"]
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-secrets
key: access_key_id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-secrets
key: secret_access_key
- name: AWS_S3_CSV_PATH
valueFrom:
secretKeyRef:
name: aws-secrets
key: s3_csv_path
resources:
requests:
cpu: 1
memory: 1Gi
volumeMounts:
- name: topics-volume
mountPath: /app/state
volumes:
- name: topics-volume
persistentVolumeClaim:
claimName: topics-volume-claim
我该如何调试?
答案 0 :(得分:2)
TTL将清除作业本身及其所有子对象。 ttlSecondsAfterFinished
未设置,因此尚未清理作业。
注意:如果您的作业具有
restartPolicy = "OnFailure"
,请注意,一旦达到作业退避限制,运行该作业的容器将被终止。这会使调试作业的可执行文件更加困难。我们建议在调试作业或使用日志记录系统时设置restartPolicy = "Never"
,以确保失败的作业的输出不会意外丢失。
您发布的Job规范没有backoffLimit
,因此它应尝试运行基础任务6次。
如果容器进程以非零状态退出,则它将失败,因此可以在日志中完全保持沉默。
该规范未指定定义的activeDeadlineSeconds
秒,因此我不确定最终会导致哪种类型的超时。我认为这将是容器的一次严重失败,因此不会出现超时问题。