Question

我正在尝试部署Kubernetes集群，我的主节点已启动并且正在运行，但是某些Pod处于挂起状态。下面是get pods的输出。

NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE   IP       NODE                NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-65b4876956-29tj9    0/1     Pending   0          9h    <none>   <none>              <none>           <none>
kube-system   calico-node-bf25l                           2/2     Running   2          9h    <none>   master-0-eccdtest   <none>           <none>
kube-system   coredns-7d6cf57b54-b55zw                    0/1     Pending   0          9h    <none>   <none>              <none>           <none>
kube-system   coredns-7d6cf57b54-bk6j5                    0/1     Pending   0          12m   <none>   <none>              <none>           <none>
kube-system   kube-apiserver-master-0-eccdtest            1/1     Running   1          9h    <none>   master-0-eccdtest   <none>           <none>
kube-system   kube-controller-manager-master-0-eccdtest   1/1     Running   1          9h    <none>   master-0-eccdtest   <none>           <none>
kube-system   kube-proxy-jhfjj                            1/1     Running   1          9h    <none>   master-0-eccdtest   <none>           <none>
kube-system   kube-scheduler-master-0-eccdtest            1/1     Running   1          9h    <none>   master-0-eccdtest   <none>           <none>
kube-system   openstack-cloud-controller-manager-tlp4m    1/1     CrashLoopBackOff   114        9h    <none>   master-0-eccdtest   <none>           <none>

当我尝试查看pod日志时，出现以下错误。

Error from server: no preferred addresses found; known addresses: []

Kubectl get事件有很多警告。

NAMESPACE     LAST SEEN   TYPE      REASON                    KIND   MESSAGE
default       23m         Normal    Starting                  Node   Starting kubelet.
default       23m         Normal    NodeHasSufficientMemory   Node   Node master-0-eccdtest status is now: NodeHasSufficientMemory
default       23m         Normal    NodeHasNoDiskPressure     Node   Node master-0-eccdtest status is now: NodeHasNoDiskPressure
default       23m         Normal    NodeHasSufficientPID      Node   Node master-0-eccdtest status is now: NodeHasSufficientPID
default       23m         Normal    NodeAllocatableEnforced   Node   Updated Node Allocatable limit across pods
default       23m         Normal    Starting                  Node   Starting kube-proxy.
default       23m         Normal    RegisteredNode            Node   Node master-0-eccdtest event: Registered Node master-0-eccdtest in Controller
kube-system   26m         Warning   FailedScheduling          Pod    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   3m15s       Warning   FailedScheduling          Pod    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   25m         Warning   DNSConfigForming          Pod    Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.96.0.10 10.51.40.100 10.51.40.103
kube-system   23m         Normal    SandboxChanged            Pod    Pod sandbox changed, it will be killed and re-created.
kube-system   23m         Normal    Pulled                    Pod    Container image "registry.eccd.local:5000/node:v3.6.1-26684321" already present on machine
kube-system   23m         Normal    Created                   Pod    Created container
kube-system   23m         Normal    Started                   Pod    Started container
kube-system   23m         Normal    Pulled                    Pod    Container image "registry.eccd.local:5000/cni:v3.6.1-26684321" already present on machine
kube-system   23m         Normal    Created                   Pod    Created container
kube-system   23m         Normal    Started                   Pod    Started container
kube-system   23m         Warning   Unhealthy                 Pod    Readiness probe failed: Threshold time for bird readiness check:  30s
calico/node is not ready: felix is not ready: Get http://localhost:9099/readiness: dial tcp [::1]:9099: connect: connection refused
kube-system   23m     Warning   Unhealthy          Pod          Liveness probe failed: Get http://localhost:9099/liveness: dial tcp [::1]:9099: connect: connection refused
kube-system   26m     Warning   FailedScheduling   Pod          0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   3m15s   Warning   FailedScheduling   Pod          0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   105s    Warning   FailedScheduling   Pod          0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   26m     Warning   FailedScheduling   Pod          0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   22m     Warning   FailedScheduling   Pod          0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
kube-system   21m     Warning   FailedScheduling   Pod          skip schedule deleting pod: kube-system/coredns-7d6cf57b54-w95g4
kube-system   21m     Normal    SuccessfulCreate   ReplicaSet   Created pod: coredns-7d6cf57b54-bk6j5
kube-system   26m     Warning   DNSConfigForming   Pod          Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.96.0.10 10.51.40.100 10.51.40.103
kube-system   23m     Normal    SandboxChanged     Pod          Pod sandbox changed, it will be killed and re-created.
kube-system   23m     Normal    Pulled             Pod          Container image "registry.eccd.local:5000/kube-apiserver:v1.13.5-1-80cc0db3" already present on machine
kube-system   23m     Normal    Created            Pod          Created container
kube-system   23m     Normal    Started            Pod          Started container
kube-system   26m     Warning   DNSConfigForming   Pod          Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.96.0.10 10.51.40.100 10.51.40.103
kube-system   23m     Normal    SandboxChanged     Pod          Pod sandbox changed, it will be killed and re-created.
kube-system   23m     Normal    Pulled             Pod          Container image "registry.eccd.local:5000/kube-controller-manager:v1.13.5-1-80cc0db3" already present on machine
kube-system   23m     Normal    Created            Pod          Created container
kube-system   23m     Normal    Started            Pod          Started container
kube-system   23m     Normal    LeaderElection     Endpoints    master-0-eccdtest_ed8f0ece-a6cd-11e9-9dd7-fa163e182aab became leader
kube-system   26m     Warning   DNSConfigForming   Pod          Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.96.0.10 10.51.40.100 10.51.40.103
kube-system   23m     Normal    SandboxChanged     Pod          Pod sandbox changed, it will be killed and re-created.
kube-system   23m     Normal    Pulled             Pod          Container image "registry.eccd.local:5000/kube-proxy:v1.13.5-1-80cc0db3" already present on machine
kube-system   23m     Normal    Created            Pod          Created container
kube-system   23m     Normal    Started            Pod          Started container
kube-system   26m     Warning   DNSConfigForming   Pod          Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.96.0.10 10.51.40.100 10.51.40.103
kube-system   23m     Normal    SandboxChanged     Pod          Pod sandbox changed, it will be killed and re-created.
kube-system   23m     Normal    Pulled             Pod          Container image "registry.eccd.local:5000/kube-scheduler:v1.13.5-1-80cc0db3" already present on machine
kube-system   23m     Normal    Created            Pod          Created container
kube-system   23m     Normal    Started            Pod          Started container
kube-system   23m     Normal    LeaderElection     Endpoints    master-0-eccdtest_ee2520c1-a6cd-11e9-96a3-fa163e182aab became leader
kube-system   26m     Warning   DNSConfigForming   Pod          Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.96.0.10 10.51.40.100 10.51.40.103
kube-system   36m     Warning   BackOff            Pod          Back-off restarting failed container
kube-system   23m     Normal    SandboxChanged     Pod          Pod sandbox changed, it will be killed and re-created.
kube-system   20m     Normal    Pulled             Pod          Container image "registry.eccd.local:5000/openstack-cloud-controller-manager:v1.14.0-1-11023d82" already present on machine
kube-system   20m     Normal    Created            Pod          Created container
kube-system   20m     Normal    Started            Pod          Started container
kube-system   3m20s   Warning   BackOff            Pod          Back-off restarting failed container

reslov.conf中唯一的名称服务器是

nameserver 10.96.0.10

我已广泛使用google解决这些问题，但没有任何有效的解决方案。任何建议，将不胜感激。

TIA

Answer 1

您的主要问题是0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate警告消息。由于node-role.kubernetes.io/master:NoSchedule和node.kubernetes.io/not-ready:NoSchedule taints

此异味会阻止在当前节点上调度Pod。

如果您希望能够在控制平面节点上安排广告连播，例如对于用于开发的单机Kubernetes集群，运行：

kubectl taint nodes instance-1 node-role.kubernetes.io/master-
kubectl taint nodes instance-1 node.kubernetes.io/not-ready:NoSchedule-

但是从我的战俘中最好：

-initiate cluster using kubeadm

-apply CNI

-add new worker node

-并将所有新的Pod安排在工作节点上。

sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.15.0
...

Your Kubernetes control-plane has initialized successfully!

$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config


$ kubectl apply -f https://docs.projectcalico.org/v3.7/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.extensions/calico-node created
serviceaccount/calico-node created
deployment.extensions/calico-kube-controllers created
serviceaccount/calico-kube-controllers created


-ADD worker node using kubeadm join string on slave node

$ kubectl get nodes
NAME         STATUS   ROLES    AGE   VERSION
instance-1   Ready    master   21m   v1.15.0
instance-2   Ready    <none>   34s   v1.15.0

    $ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE    IP               NODE         NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-658558ddf8-v2rqx   1/1     Running   0          11m    192.168.23.129   instance-1   <none>           <none>
kube-system   calico-node-c2tkt                          1/1     Running   0          11m    10.132.0.36      instance-1   <none>           <none>
kube-system   calico-node-dhc66                          1/1     Running   0          107s   10.132.0.38      instance-2   <none>           <none>
kube-system   coredns-5c98db65d4-dqjm7                   1/1     Running   0          22m    192.168.23.130   instance-1   <none>           <none>
kube-system   coredns-5c98db65d4-hh7vd                   1/1     Running   0          22m    192.168.23.131   instance-1   <none>           <none>
kube-system   etcd-instance-1                            1/1     Running   0          21m    10.132.0.36      instance-1   <none>           <none>
kube-system   kube-apiserver-instance-1                  1/1     Running   0          21m    10.132.0.36      instance-1   <none>           <none>
kube-system   kube-controller-manager-instance-1         1/1     Running   0          21m    10.132.0.36      instance-1   <none>           <none>
kube-system   kube-proxy-qwvkq                           1/1     Running   0          107s   10.132.0.38      instance-2   <none>           <none>
kube-system   kube-proxy-s9gng                           1/1     Running   0          22m    10.132.0.36      instance-1   <none>           <none>
kube-system   kube-scheduler-instance-1                  1/1     Running   0          21m    10.132.0.36      instance-1   <none>           <n

Answer 2

我已经解决了这个问题。我没有从主节点访问云控制器FQDN的权限。我在主服务器/etc/resolv.conf中添加了DNS条目，并且可以正常工作。

Calico-kube-controller，coredns pod处于挂起状态

2 个答案: