Kubernetes-托管Pod的节点无法访问时如何从服务负载均衡器中删除Pod

时间:2020-10-15 18:16:35

标签: kubernetes

在我的VirtualBox上使用kubeadm部署了本地Kubernetes集群(主节点和2个工作节点):

NAME     STATUS   ROLES    AGE     VERSION
master   Ready    master   2d21h   v1.19.2
node1    Ready    <none>   2d21h   v1.19.2
node2    Ready    <none>   2d21h   v1.19.2

容器化的应用程序是一个普通的node.js,它通过URL http:// ExternalIP:8080

显示“ HelloWorld”
NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
helloworld-service   LoadBalancer   10.111.237.33   192.168.1.163   8080:30317/TCP   124m
kubernetes           ClusterIP      10.96.0.1       <none>          443/TCP          2d21h

配置的部署具有副本= 2。假设如果我运行以下命令,则一个Pod在node1上运行,而另一个在node2上运行:

 curl http://ExternalIP:8080
在for循环中,

按预期从群集外部的外部计算机为每个收到的请求提供“ HelloWorld”。这里描述服务:

 Name:                     helloworld-service
 Namespace:                default
 Labels:                   <none>
 Annotations:              <none>
 Selector:                 app=hello-world-app
 Type:                     LoadBalancer
 IP:                       10.111.237.33
 External IPs:             192.168.1.163
 Port:                     <unset>  8080/TCP
 TargetPort:               8081/TCP
 NodePort:                 <unset>  30317/TCP
 Endpoints:                10.36.0.1:8081,10.44.0.1:8081
 Session Affinity:         None
 External Traffic Policy:  Cluster
 Events:                   <none>

现在,从外部计算机激活for循环,我想测试关闭某个工作节点的应用程序的停机时间。结果是:

 - about 30 seconds having no response and 503 service unavailable from application
 - the application starts working fine only after the Master declares the turned-off Node as NotRunning 

此时群集的输出为:

NAME     STATUS      ROLES    AGE     VERSION
master   Ready       master   2d21h   v1.19.2
node1    Ready       <none>   2d21h   v1.19.2
node2    NotReady    <none>   2d21h   v1.19.2

应用程序回复每个请求。

来自kubernetes.io文档:

kubelet使用就绪性探测器来了解何时容器准备开始接受流量。当Pod的所有容器都准备就绪时,即视为准备就绪。该信号的一种用法是控制将哪些Pod用作服务的后端。当Pod尚未就绪时,会将其从服务负载平衡器中删除。

我更改了Deployment对象中的yaml配置,但仍然导致30秒的应用程序停机:

  apiVersion: v1
  kind: Service
  metadata:
    name: helloworld-service

  spec:
    selector:
      app: hello-world-app

    ports:
    - protocol: TCP
      port: 8080
      targetPort: 8081

    type: LoadBalancer
    clusterIP: 10.111.237.33

    externalIPs:
    - 192.168.1.163

  status:
    loadBalancer:
      ingress:
      - ip: 192.168.1.224

  apiVersion: apps/v1

  kind: Deployment

  metadata:
    name: helloworld-deployment

  spec:
    selector:
      matchLabels:
        app: hello-world-app
    
    replicas: 2

    template:
      metadata:
        labels:
          app: hello-world-app

      spec:
        containers:
        - name: hello-world-container
          image: marcoif81/testmarcoif:latest

          ports:
          - containerPort: 8081

          livenessProbe:
            tcpSocket:
              port: 8081
            initialDelaySeconds: 15
            periodSeconds: 2
            failureThreshold: 2
          readinessProbe:
            tcpSocket:
              port: 8081
            initialDelaySeconds: 15
            periodSeconds: 2
            failureThreshold: 2

从此处的命令在应用程序停机期间执行:

  # kubectl describe pod "pod_name"

在“事件”部分中,未显示有关探测的输出:

Events:
Type     Reason        Age   From               Message
----     ------        ----  ----               -------
Normal   Scheduled     89s   default-scheduler  Successfully assigned default/helloworld-deployment-7456bb6569-qrfnj to node2
Warning  FailedMount   88s   kubelet            MountVolume.SetUp failed for volume "default-token-hjq76" : failed to sync secret cache: timed out waiting for the condition
Normal   Pulling       87s   kubelet            Pulling image "marcoif81/testmarcoif:latest"
Normal   Pulled        86s   kubelet            Successfully pulled image "marcoif81/testmarcoif:latest" in 1.581369887s
Normal   Created       86s   kubelet            Created container hello-world-container
Normal   Started       85s   kubelet            Started container hello-world-container
Warning  NodeNotReady  1s    node-controller    Node is not ready

1 个答案:

答案 0 :(得分:1)

从Matt的以上评论中阅读建议的doc k8,很明显,准备就绪和活跃性探查不是监视工作节点的运行状况的功能。如文件所述:

监视节点的运行状况:当节点变得不可访问时,节点控制器负责将NodeStatus的NodeReady条件更新为ConditionUnknown(即,由于某些原因,例如由于节点处于活动状态,节点控制器停止接收心跳)向下),然后如果该节点继续无法访问,则从该节点逐出所有pod(使用正常终止)。 (默认超时为40秒,开始报告ConditionUnknown,之后为5m,开始逐出pod。)节点控制器每隔--node-monitor-period秒检查一次每个节点的状态

出于测试目的,我更改了以下计时器:

  • 在主节点上:--node-monitor-grace-period从默认值:40s->到6s

  • 在工作节点上:--node-status-update-frequency 3s

有关计时器的更多信息,请参见:

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/