我得到了带有1个主节点和3个工作节点的Kubernetes集群。
通过kubespray https://github.com/kubernetes-sigs/kubespray安装的calico v3.7.3 kubernetes v1.16.0
在此之前,我通常部署所有吊舱都没有问题。
我无法启动几个Pod(Ceph):
kubectl get all --namespace=ceph
NAME READY STATUS RESTARTS AGE
pod/ceph-cephfs-test 0/1 Pending 0 162m
pod/ceph-mds-665d849f4f-fzzwb 0/1 Pending 0 162m
pod/ceph-mon-744f6dc9d6-jtbgk 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-744f6dc9d6-mqwgb 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-744f6dc9d6-zthpv 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-check-6f474c97f-gjr9f 1/1 Running 0 162m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ceph-mon ClusterIP None <none> 6789/TCP 162m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/ceph-osd 0 0 0 0 0 node-type=storage 162m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ceph-mds 0/1 1 0 162m
deployment.apps/ceph-mon 0/3 3 0 162m
deployment.apps/ceph-mon-check 1/1 1 1 162m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ceph-mds-665d849f4f 1 1 0 162m
replicaset.apps/ceph-mon-744f6dc9d6 3 3 0 162m
replicaset.apps/ceph-mon-check-6f474c97f 1 1 1 162m
但是另一个obe也可以:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6d57b44787-xlj89 1/1 Running 19 24d
calico-node-dwm47 1/1 Running 310 19d
calico-node-hhgzk 1/1 Running 15 24d
calico-node-tk4mp 1/1 Running 309 19d
calico-node-w7zvs 1/1 Running 312 19d
coredns-74c9d4d795-jrxjn 1/1 Running 0 2d23h
coredns-74c9d4d795-psf2v 1/1 Running 2 18d
dns-autoscaler-7d95989447-7kqsn 1/1 Running 10 24d
kube-apiserver-master 1/1 Running 4 24d
kube-controller-manager-master 1/1 Running 3 24d
kube-proxy-9bt8m 1/1 Running 2 19d
kube-proxy-cbrcl 1/1 Running 4 19d
kube-proxy-stj5g 1/1 Running 0 19d
kube-proxy-zql86 1/1 Running 0 19d
kube-scheduler-master 1/1 Running 3 24d
kubernetes-dashboard-7c547b4c64-6skc7 1/1 Running 591 24d
nginx-proxy-worker1 1/1 Running 2 19d
nginx-proxy-worker2 1/1 Running 0 19d
nginx-proxy-worker3 1/1 Running 0 19d
nodelocaldns-6t92x 1/1 Running 2 19d
nodelocaldns-kgm4t 1/1 Running 0 19d
nodelocaldns-xl8zg 1/1 Running 0 19d
nodelocaldns-xwlwk 1/1 Running 12 24d
tiller-deploy-8557598fbc-7f2w6 1/1 Running 0 131m
我使用Centos 7:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
错误日志:
Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000×tamps=true: dial tcp 10.2.67.203:10250: connect: no route to host
也许有人遇到过这个问题可以帮助我吗?我将提供任何其他信息
悬而未决的吊舱的日志:
警告失败,调度98s(x125超过3h1m)的默认调度程序有0/4个节点:4个节点与节点选择器不匹配。
答案 0 :(得分:0)
tl; dr; :看来您的群集本身已经损坏,应该在专门查看Ceph之前对其进行修复
class ViewController: UIViewController { @IBOutlet weak var imageView: UIImageView! @IBOutlet weak var useridTextField: UITextField! @IBOutlet weak var passwordTextField: UITextField! override func viewDidLoad() { super.viewDidLoad() useridTextField.placeholder = "Please enter userid" passwordTextField.placeholder = "Please enter password" useridTextField.text = useridTextField.placeholder passwordTextField.text = passwordTextField.placeholder } @IBAction func didTapLogin(_ sender: Any) { let userid = useridTextField.userInput ?? "" let password = passwordTextField.userInput ?? "" switch (userid.isEmpty, password.isEmpty) { case (true, true): AlertFun.showAlert(title: "", message: "Please enter PhoneNumber & Password", in: self) case (true, _): AlertFun.showAlert(title: "", message: "Please enter PhoneNumber", in: self) case (_, true): AlertFun.showAlert(title: "", message: "Please enter Password", in: self) default: logInService() } } func logInService() { ... } } extension ViewController: UITextFieldDelegate { func textFieldDidBeginEditing(_ textField: UITextField) { if textField.text == textField.placeholder { textField.text = "" } } func textFieldDidEndEditing(_ textField: UITextField, reason: UITextField.DidEndEditingReason) { if textField.text?.isEmpty ?? true { textField.text = textField.placeholder } } }
Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000×tamps=true: dial tcp 10.2.67.203:10250: connect: no route to host
是Kubernetes API服务器用来连接到节点的Kubelet来检索日志的端口。
此错误表明Kubernetes API服务器无法访问该节点。这与您的容器,吊舱甚至您的CNI网络无关。 10250
表示以下任一情况:
在解决Ceph pod的问题之前,我将调查为什么无法从API服务器访问Kubelet。
解决了潜在的网络连接问题之后,我将解决崩溃循环的Calico Pod(可以通过运行no route to host
来查看先前执行的容器的日志)。
一旦对基础网络和Pod网络都进行了排序,我将解决Kubernetes Dashboard崩溃循环的问题,最后,开始研究为什么部署Ceph时会遇到问题。
答案 1 :(得分:0)
似乎防火墙阻止了10250
节点上端口10.2.67.203
的入口流量。
您可以通过运行以下命令来打开它(假设已安装firewalld或您可以运行等效防火墙模块的命令):
A)sudo firewall-cmd --add-port=10250/tcp --permanent
。
B)sudo firewall-cmd --reload
。
C)sudo firewall-cmd --list-all
-您应该看到端口10250
已更新。