仅使用kubeadm v1.15.0安装了一个主群集。但是,coredns似乎停留在挂起模式:
coredns-5c98db65d4-4pm65 0/1 Pending 0 2m17s <none> <none> <none> <none>
coredns-5c98db65d4-55hcc 0/1 Pending 0 2m2s <none> <none> <none> <none>
以下是该广告连播显示的内容:
kubectl describe pods coredns-5c98db65d4-4pm65 --namespace=kube-system
Name: coredns-5c98db65d4-4pm65
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=5c98db65d4
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-5c98db65d4
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.3.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-5t2wn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-5t2wn:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-5t2wn
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 61s (x4 over 5m21s) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
我删除了主节点上的污点,但无济于事。我不应该能够创建一个没有这种问题的单节点主机。我知道在不移除污点的情况下无法在主节点上安排Pod,但这很奇怪。
我也尝试添加最新的calico cni,但无济于事。
我得到以下正在运行的journalctl(systemctl没有显示错误):
sudo journalctl -xn --unit kubelet.service
[sudo] password for gms:
-- Logs begin at Fri 2019-07-12 04:31:34 CDT, end at Tue 2019-07-16 16:58:17 CDT. --
Jul 16 16:57:54 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:57:54.122355 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:57:54 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:57:54.400606 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:57:59 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:57:59.124863 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:57:59 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:57:59.400924 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:04 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:04.127120 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:04 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:04.401266 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:09 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:09.129287 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:09 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:09.401520 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:14 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:14.133059 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:14 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:14.402008 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
确实,当我查看/etc/cni/net.d
时没有任何内容->是的,我运行了kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
...这是应用此命令时的输出:
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
我在calico-node的Pod上运行了以下内容,该对象处于以下状态:
calico-node-tcfhw 0/1 Init:0/3 0 11m 10.32.3.158
describe pods calico-node-tcfhw --namespace=kube-system
Name: calico-node-tcfhw
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: thalia0.ahc.umn.edu/10.32.3.158
Start Time: Tue, 16 Jul 2019 18:08:25 -0500
Labels: controller-revision-hash=844ddd97c6
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP: 10.32.3.158
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://1e1bf9e65cb182656f6f06a1bb8291237562f0f5a375e557a454942e81d32063
Image: calico/cni:v3.8.0
Image ID: docker-pullable://docker.io/calico/cni@sha256:decba0501ab0658e6e7da2f5625f1eabb8aba5690f9206caba3bf98caca5094c
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Running
Started: Tue, 16 Jul 2019 18:08:26 -0500
Ready: False
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
install-cni:
Container ID:
Image: calico/cni:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
flexvol-driver:
Container ID:
Image: calico/pod2daemon-flexvol:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
Containers:
calico-node:
Container ID:
Image: calico/node:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
Liveness: http-get http://localhost:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -bird-ready -felix-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-b9c6p:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-b9c6p
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m15s default-scheduler Successfully assigned kube-system/calico-node-tcfhw to thalia0.ahc.umn.edu
Normal Pulled 9m14s kubelet, thalia0.ahc.umn.edu Container image "calico/cni:v3.8.0" already present on machine
Normal Created 9m14s kubelet, thalia0.ahc.umn.edu Created container upgrade-ipam
Normal Started 9m14s kubelet, thalia0.ahc.umn.edu Started container upgrade-ipam
我尝试过Flannel作为CNI,但情况更糟。 kube-proxy甚至因污点而无法启动!
编辑附录
kube-controller-manager
和kube-scheduler
是否没有定义的端点?
[gms@thalia0 ~]$ kubectl get ep --namespace=kube-system -o wide
NAME ENDPOINTS AGE
kube-controller-manager <none> 19h
kube-dns <none> 19h
kube-scheduler <none> 19h
[gms@thalia0 ~]$ kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-nmn4g 0/1 Pending 0 19h
coredns-5c98db65d4-qv8fm 0/1 Pending 0 19h
etcd-thalia0.x.x.edu. 1/1 Running 0 19h
kube-apiserver-thalia0.x.x.edu 1/1 Running 0 19h
kube-controller-manager-thalia0.x.x.edu 1/1 Running 0 19h
kube-proxy-4hrdc 1/1 Running 0 19h
kube-proxy-vb594 1/1 Running 0 19h
kube-proxy-zwrst 1/1 Running 0 19h
kube-scheduler-thalia0.x.x.edu 1/1 Running 0 19h
最后,出于理智的考虑,我尝试了v1.13.1,瞧!成功:
NAME READY STATUS RESTARTS AGE
calico-node-pbrps 2/2 Running 0 15s
coredns-86c58d9df4-g5944 1/1 Running 0 2m40s
coredns-86c58d9df4-zntjl 1/1 Running 0 2m40s
etcd-thalia0.ahc.umn.edu 1/1 Running 0 110s
kube-apiserver-thalia0.ahc.umn.edu 1/1 Running 0 105s
kube-controller-manager-thalia0.ahc.umn.edu 1/1 Running 0 103s
kube-proxy-qxh2h 1/1 Running 0 2m39s
kube-scheduler-thalia0.ahc.umn.edu 1/1 Running 0 117s
编辑2
尝试sudo kubeadm upgrade plan
并发现api服务器的运行状况和证书错误。
在api服务器上运行此
: kubectl logs kube-apiserver-thalia0.x.x.edu --namespace=kube-system1
并出现了大量的TLS handshake error from 10.x.x.157:52384: remote error: tls: bad certificate
类错误,这些错误来自于早已从集群中删除的节点,以及在主节点上多次kubeadm resets
之后很久才删除和重新安装的节点。 kubelet,kubeadm等。
为什么这些旧节点出现了?不会在kubeadm init
上重新创建证书吗?
答案 0 :(得分:1)
此问题https://github.com/projectcalico/calico/issues/2699具有类似的症状,表示删除/var/lib/cni/
可以解决此问题。您可以查看它是否存在,并删除它。
答案 1 :(得分:0)
在Calico启动之前,Coreos-dns不会启动,请使用此命令检查您的工作节点是否准备就绪
kubectl get nodes -owide
kubectl describe node <your-node>
或
kubectl get node <your-node> -oyaml
要检查的其他事情是日志中的以下消息:
“无法更新cni配置:在/etc/cni/net.d中找不到网络”
您在该目录中拥有什么?
也许cni配置不正确。
该目录/etc/cni/net.d
应该包含2个文件:
10-calico.conflist calico-kubeconfig
下面是这两个文件的内容,请检查目录中是否有这样的文件
[root@master net.d]# cat 10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "master",
"mtu": 1440,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
[root @ master net.d]#cat calico-kubeconfig
# Kubeconfig file for Calico CNI plugin.
apiVersion: v1
kind: Config
clusters:
- name: local
cluster:
server: https://[10.20.0.1]:443
certificate-authority-data: LSRt.... tLQJ=
users:
- name: calico
user:
token: "eUJh .... ZBoIA"
contexts:
- name: calico-context
context:
cluster: local
user: calico
current-context: calico-context