在我的VirtualBox上使用kubeadm部署了本地Kubernetes集群(主节点和2个工作节点):
NAME STATUS ROLES AGE VERSION
master Ready master 2d21h v1.19.2
node1 Ready <none> 2d21h v1.19.2
node2 Ready <none> 2d21h v1.19.2
容器化的应用程序是一个普通的node.js,它通过URL http:// ExternalIP:8080
显示“ HelloWorld”NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
helloworld-service LoadBalancer 10.111.237.33 192.168.1.163 8080:30317/TCP 124m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d21h
配置的部署具有副本= 2。假设如果我运行以下命令,则一个Pod在node1上运行,而另一个在node2上运行:
curl http://ExternalIP:8080
在for循环中,按预期从群集外部的外部计算机为每个收到的请求提供“ HelloWorld”。这里描述服务:
Name: helloworld-service
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=hello-world-app
Type: LoadBalancer
IP: 10.111.237.33
External IPs: 192.168.1.163
Port: <unset> 8080/TCP
TargetPort: 8081/TCP
NodePort: <unset> 30317/TCP
Endpoints: 10.36.0.1:8081,10.44.0.1:8081
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
现在,从外部计算机激活for循环,我想测试关闭某个工作节点的应用程序的停机时间。结果是:
- about 30 seconds having no response and 503 service unavailable from application
- the application starts working fine only after the Master declares the turned-off Node as NotRunning
此时群集的输出为:
NAME STATUS ROLES AGE VERSION
master Ready master 2d21h v1.19.2
node1 Ready <none> 2d21h v1.19.2
node2 NotReady <none> 2d21h v1.19.2
应用程序回复每个请求。
来自kubernetes.io文档:
kubelet使用就绪性探测器来了解何时容器准备开始接受流量。当Pod的所有容器都准备就绪时,即视为准备就绪。该信号的一种用法是控制将哪些Pod用作服务的后端。当Pod尚未就绪时,会将其从服务负载平衡器中删除。
我更改了Deployment对象中的yaml配置,但仍然导致30秒的应用程序停机:
apiVersion: v1
kind: Service
metadata:
name: helloworld-service
spec:
selector:
app: hello-world-app
ports:
- protocol: TCP
port: 8080
targetPort: 8081
type: LoadBalancer
clusterIP: 10.111.237.33
externalIPs:
- 192.168.1.163
status:
loadBalancer:
ingress:
- ip: 192.168.1.224
apiVersion: apps/v1
kind: Deployment
metadata:
name: helloworld-deployment
spec:
selector:
matchLabels:
app: hello-world-app
replicas: 2
template:
metadata:
labels:
app: hello-world-app
spec:
containers:
- name: hello-world-container
image: marcoif81/testmarcoif:latest
ports:
- containerPort: 8081
livenessProbe:
tcpSocket:
port: 8081
initialDelaySeconds: 15
periodSeconds: 2
failureThreshold: 2
readinessProbe:
tcpSocket:
port: 8081
initialDelaySeconds: 15
periodSeconds: 2
failureThreshold: 2
从此处的命令在应用程序停机期间执行:
# kubectl describe pod "pod_name"
在“事件”部分中,未显示有关探测的输出:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 89s default-scheduler Successfully assigned default/helloworld-deployment-7456bb6569-qrfnj to node2
Warning FailedMount 88s kubelet MountVolume.SetUp failed for volume "default-token-hjq76" : failed to sync secret cache: timed out waiting for the condition
Normal Pulling 87s kubelet Pulling image "marcoif81/testmarcoif:latest"
Normal Pulled 86s kubelet Successfully pulled image "marcoif81/testmarcoif:latest" in 1.581369887s
Normal Created 86s kubelet Created container hello-world-container
Normal Started 85s kubelet Started container hello-world-container
Warning NodeNotReady 1s node-controller Node is not ready
答案 0 :(得分:1)
从Matt的以上评论中阅读建议的doc k8,很明显,准备就绪和活跃性探查不是监视工作节点的运行状况的功能。如文件所述:
监视节点的运行状况:当节点变得不可访问时,节点控制器负责将NodeStatus的NodeReady条件更新为ConditionUnknown(即,由于某些原因,例如由于节点处于活动状态,节点控制器停止接收心跳)向下),然后如果该节点继续无法访问,则从该节点逐出所有pod(使用正常终止)。 (默认超时为40秒,开始报告ConditionUnknown,之后为5m,开始逐出pod。)节点控制器每隔--node-monitor-period秒检查一次每个节点的状态
出于测试目的,我更改了以下计时器:
在主节点上:--node-monitor-grace-period从默认值:40s->到6s
在工作节点上:--node-status-update-frequency 3s
有关计时器的更多信息,请参见:
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/