k8s 上的 Hashicorp 保险库:获取错误 1 ​​内存不足,1 个节点与 pod 关联性/反关联性不匹配

时间:2021-01-22 08:08:40

标签: kubernetes memory configuration kubernetes-helm hashicorp-vault

我正在 k8s (EKS) 上部署 ha vault 并在其中一个 vault pod 上收到此错误,我认为这也会导致其他 pod 失败: 这是 kubectl get events:
的输出 搜索:nodes are available: 1 Insufficient memory

26m         Normal    Created                        pod/vault-1                                 Created container vault
26m         Normal    Started                        pod/vault-1                                 Started container vault
26m         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m40s       Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-1                                 Successfully assigned vault-foo/vault-1 to ip-10-101-0-103.ec2.internal
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-acfc7e26-3616-4075-ab79-0c3f7b0f6470"
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-19d03d48-1de2-41f8-aadf-02d0a9f4bfbd"
48s         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
48s         Normal    Created                        pod/vault-1                                 Created container vault
99s         Normal    Started                        pod/vault-1                                 Started container vault
60s         Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
27m         Normal    TaintManagerEviction           pod/vault-2                                 Cancelling deletion of Pod vault-foo/vault-2
28m         Warning   FailedScheduling               pod/vault-2                                 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu.
28m         Warning   FailedScheduling               pod/vault-2                                 0/5 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
27m         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-103.ec2.internal
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
27m         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
27m         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
26m         Normal    Created                        pod/vault-2                                 Created container vault
26m         Normal    Started                        pod/vault-2                                 Started container vault
26m         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m26s       Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m36s       Warning   FailedScheduling               pod/vault-2                                 0/7 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
114s        Warning   FailedScheduling               pod/vault-2                                 0/8 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
104s        Warning   FailedScheduling               pod/vault-2                                 0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
93s         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-82.ec2.internal
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
83s         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
81s         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
38s         Normal    Created                        pod/vault-2                                 Created container vault
37s         Normal    Started                        pod/vault-2                                 Started container vault
38s         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
4s          Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-qwsmz    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-qwsmz to ip-10-101-2-91.ec2.internal
2m37s       Normal    Pulling                        pod/vault-agent-injector-d54bdc675-qwsmz    Pulling image "hashicorp/vault-k8s:latest"
2m36s       Normal    Pulled                         pod/vault-agent-injector-d54bdc675-qwsmz    Successfully pulled image "hashicorp/vault-k8s:latest"
2m36s       Normal    Created                        pod/vault-agent-injector-d54bdc675-qwsmz    Created container sidecar-injector
2m35s       Normal    Started                        pod/vault-agent-injector-d54bdc675-qwsmz    Started container sidecar-injector
28m         Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-wz9ws    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-wz9ws to ip-10-101-0-87.ec2.internal
28m         Normal    Pulled                         pod/vault-agent-injector-d54bdc675-wz9ws    Container image "hashicorp/vault-k8s:latest" already present on machine
28m         Normal    Created                        pod/vault-agent-injector-d54bdc675-wz9ws    Created container sidecar-injector
28m         Normal    Started                        pod/vault-agent-injector-d54bdc675-wz9ws    Started container sidecar-injector
3m22s       Normal    Killing                        pod/vault-agent-injector-d54bdc675-wz9ws    Stopping container sidecar-injector
3m22s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Readiness probe failed: Get dial tcp connect: connection refused
3m18s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Liveness probe failed: Get dial tcp connect: no route to host
28m         Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-wz9ws
2m38s       Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-qwsmz
28m         Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
2m38s       Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
28m         Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
28m         Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
26m         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
3m24s       Normal    DeletingLoadBalancer           service/vault-ui                            Deleting load balancer
3m23s       Warning   PortNotAllocated               service/vault-ui                            Port 32476 is not allocated; repairing
3m23s       Warning   ClusterIPNotAllocated          service/vault-ui                            Cluster IP is not allocated; repairing
3m22s       Warning   FailedToUpdateEndpointSlices   service/vault-ui                            Error updating Endpoint Slices for Service vault-foo/vault-ui: failed to update vault-ui-crtg4 EndpointSlice for Service vault-foo/vault-ui: Operation cannot be fulfilled on endpointslices.discovery.k8s.io "vault-ui-crtg4": the object has been modified; please apply your changes to the latest version and try again
3m16s       Warning   FailedToUpdateEndpoint         endpoints/vault-ui                          Failed to update endpoint vault-foo/vault-ui: Operation cannot be fulfilled on endpoints "vault-ui": the object has been modified; please apply your changes to the latest version and try again
2m52s       Normal    DeletedLoadBalancer            service/vault-ui                            Deleted load balancer
2m39s       Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
2m36s       Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
96s         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
28m         Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful
2m40s       Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful


# Vault Helm Chart Value Overrides
  enabled: true
  tlsDisable: false

  enabled: true
  # Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
    repository: "hashicorp/vault-k8s"
    tag: "latest"

        memory: 256Mi
        cpu: 250m
        memory: 256Mi
        cpu: 250m

  # Use the Enterprise Image
    repository: "hashicorp/vault-enterprise"
    tag: "1.5.0_ent"

  # These Resource Limits are in line with node requirements in the
  # Vault Reference Architecture for a Small Cluster
      memory: 8Gi
      cpu: 2000m
      memory: 16Gi
      cpu: 2000m

  # For HA configuration and because we need to manually init the vault,
  # we need to define custom readiness/liveness Probe settings
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 60

  # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
  # used to include variables required for auto-unseal.
    VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Vault in the path .
  #  - type: secret
  #    name: tls-server
  #  - type: secret
  #    name: tls-ca
  #  - type: secret
  #    name: kms-creds
    - type: secret
      name: vault-server-tls   
  # This configures the Vault Statefulset to create a PVC for audit logs.
  # See https://www.vaultproject.io/docs/audit/index.html to know more
    enabled: true

    enabled: false

  # Run Vault in "HA" mode.
    enabled: true
    replicas: 3
      enabled: true
      setNodeId: true

      config: |
        ui = true
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
          tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"

        storage "raft" {
          path = "/vault/data"
            retry_join {
            leader_api_addr = "http://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          retry_join {
            leader_api_addr = "http://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          retry_join {
            leader_api_addr = "http://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"

        service_registration "kubernetes" {}

# Vault UI
  enabled: true
  serviceType: "LoadBalancer"
  serviceNodePort: null
  externalPort: 8200

  # For Added Security, edit the below
  #   - < Your IP RANGE Ex. >
  #   - < YOUR SINGLE IP Ex. >


0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.

您有 9 个节点,但由于一组不同的条件,没有一个可用于调度。请注意,每个节点都可能受到多个问题的影响,因此这些数字加起来可能会超过您在节点总数上的数量。


  • Insufficient memory:执行 kubectl describe node <node-name> 以检查那里有多少可用内存。检查 Pod 的请求和限制。请注意,无论 Pod 使用多少内存,Kubernetes 都会阻塞 Pod 请求的全部内存。

  • Insufficient cpu:同上。

  • node(s) didn't match pod affinity/anti-affinity:检查您的 affinity/anti-affinity 规则。

  • node(s) didn't satisfy existing pods anti-affinity rules:同上。

  • node(s) had volume node affinity conflict:当 pod 无法被调度时发生,因为它无法从另一个可用区连接到卷。您可以通过为单个区域创建 storageclass 并在您的 PVC 中使用该 storageclass 来解决此问题。

  • node(s) were unschedulable:这是因为节点被标记为Unschedulable。这就引出了下面的下一个问题:

  • node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate:这对应于 NodeCondition Ready = False。您可以使用 kubectl describe node 检查污点和 kubectl taint nodes <node-name> <taint-name>- 以删除它们。查看Taints and Tolerations了解更多详情。

还有一个 GitHub thread 有类似的问题,您可能会觉得有用。
