Kubuernetes作业失败,没有日志,没有终止原因,没有事件

时间:2019-08-03 17:43:39

标签: kubernetes

我在Kubernetes过夜。当我早上检查它时,它失败了。通常,我会检查pod日志或事件以确定原因。但是,窗格已删除,没有任何事件。

kubectl describe job topics-etl --namespace dnc

这是describe的输出:

Name:           topics-etl
Namespace:      dnc
Selector:       controller-uid=391cb7e5-b5a0-11e9-a905-0697dd320292
Labels:         controller-uid=391cb7e5-b5a0-11e9-a905-0697dd320292
                job-name=topics-etl
Annotations:    kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"topics-etl","namespace":"dnc"},"spec":{"template":{"spec":{"con...
Parallelism:    1
Completions:    1
Start Time:     Fri, 02 Aug 2019 22:38:56 -0500
Pods Statuses:  0 Running / 0 Succeeded / 1 Failed
Pod Template:
  Labels:  controller-uid=391cb7e5-b5a0-11e9-a905-0697dd320292
           job-name=topics-etl
  Containers:
   docsund-etl:
    Image:      acarl005/docsund-topics-api:0.1.4
    Port:       <none>
    Host Port:  <none>
    Command:
      ./create-topic-data
    Requests:
      cpu:     1
      memory:  1Gi
    Environment:
      AWS_ACCESS_KEY_ID:      <set to the key 'access_key_id' in secret 'aws-secrets'>      Optional: false
      AWS_SECRET_ACCESS_KEY:  <set to the key 'secret_access_key' in secret 'aws-secrets'>  Optional: false
      AWS_S3_CSV_PATH:        <set to the key 's3_csv_path' in secret 'aws-secrets'>        Optional: false
    Mounts:
      /app/state from topics-volume (rw)
  Volumes:
   topics-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  topics-volume-claim
    ReadOnly:   false
Events:         <none>

这是作业配置yaml。它有restartPolicy: OnFailure,但从未重启。我也没有设置TTL,所以永远不要清理豆荚。

apiVersion: batch/v1
kind: Job
metadata:
  name: topics-etl
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: docsund-etl
          image: acarl005/docsund-topics-api:0.1.6
          command: ["./create-topic-data"]
          env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-secrets
                  key: access_key_id
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-secrets
                  key: secret_access_key
            - name: AWS_S3_CSV_PATH
              valueFrom:
                secretKeyRef:
                  name: aws-secrets
                  key: s3_csv_path
          resources:
            requests:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: topics-volume
              mountPath: /app/state
      volumes:
        - name: topics-volume
          persistentVolumeClaim:
            claimName: topics-volume-claim

我该如何调试?

1 个答案:

答案 0 :(得分:2)

TTL将清除作业本身及其所有子对象。 ttlSecondsAfterFinished未设置,因此尚未清理作业。

job docco

  

注意:如果您的作业具有restartPolicy = "OnFailure",请注意,一旦达到作业退避限制,运行该作业的容器将被终止。这会使调试作业的可执行文件更加困难。我们建议在调试作业或使用日志记录系统时设置restartPolicy = "Never",以确保失败的作业的输出不会意外丢失。

您发布的Job规范没有backoffLimit,因此它应尝试运行基础任务6次。

如果容器进程以非零状态退出,则它将失败,因此可以在日志中完全保持沉默。

该规范未指定定义的activeDeadlineSeconds秒,因此我不确定最终会导致哪种类型的超时。我认为这将是容器的一次严重失败,因此不会出现超时问题。