kubernetes-dashboard上的CrashLoopBackOff

时间:2018-03-23 19:25:12

标签: docker kubernetes kubernetes-dashboard

我是Kubernetes的菜鸟。我正在尝试按照一些方法来启动和运行一个小型集群,但我遇到了麻烦......

我有一个主节点和(4)节点,都运行Ubuntu 16.04

在所有节点上安装了docker:

$ sudo apt-get update
$ sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ sudo add-apt-repository \
   "deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update && apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')

$ sudo docker version
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:40 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:16:13 2018
  OS/Arch:      linux/amd64
  Experimental: false

关闭所有节点上的交换

$ sudo swapoff -a

在/ etc / fstab

中注释了swap挂载
$ sudo vi /etc/fstab
$ mount -a

安装了kubeadm&所有节点上的kubectl:

$ sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ sudo cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ sudo apt-get update
$ sudo apt-get install -y kubeadm kubectl
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", 
GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", 
BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", 
Platform:"linux/amd64"}

将其下载并解压缩到master和所有节点上的/ usr / local / bin中:https://github.com/kubernetes-incubator/cri-tools/releases

在所有节点上安装了etcd 3.3.0:

$ sudo groupadd --system etcd
$ sudo useradd --home-dir "/var/lib/etcd" \
      --system \
      --shell /bin/false \
      -g etcd \
      etcd

$ sudo mkdir -p /etc/etcd
$ sudo chown etcd:etcd /etc/etcd
$ sudo mkdir -p /var/lib/etcd
$ sudo chown etcd:etcd /var/lib/etcd

$ sudo rm -rf /tmp/etcd && mkdir -p /tmp/etcd
$ sudo curl -L https://github.com/coreos/etcd/releases/download/v3.3.0/etcd-    v3.3.0-linux-amd64.tar.gz -o /tmp/etcd-3.3.0-linux-amd64.tar.gz
$ sudo tar xzvf /tmp/etcd-3.3.0-linux-amd64.tar.gz -C /tmp/etcd --strip-components=1
$ sudo cp /tmp/etcd/etcd /usr/bin/etcd
$ sudo cp /tmp/etcd/etcdctl /usr/bin/etcdctl

注意到主人的知识产权:

$ sudo ifconfig -a eth0
eth0      Link encap:Ethernet  HWaddr 1e:00:51:00:00:28
          inet addr:172.20.43.30  Bcast:172.20.43.255  Mask:255.255.254.0
          inet6 addr: fe80::27b5:3d06:94c9:9d0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3194023 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3306456 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:338523846 (338.5 MB)  TX bytes:3682444019 (3.6 GB)

在主人身上初始化kubernetes:

$ sudo kubeadm init --pod-network-cidr=172.20.43.0/16 \
                    --apiserver-advertise-address=172.20.43.30 \
                    --ignore-preflight-errors=cri \
                    --kubernetes-version stable-1.9

[init] Using Kubernetes version: v1.9.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
        [WARNING CRI]: unable to check if the container runtime at     "/var/run/dockershim.sock" is running: exit status 1
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [jenkins-kube-    master kubernetes kubernetes.default kubernetes.default.svc     kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.20.43.30]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to     "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 37.502640 seconds
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node jenkins-kube-master as master by adding a label and a taint
[markmaster] Master jenkins-kube-master tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 6be4b1.9a8dacf89f71e53c
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 --discovery-token-ca-cert-hash     sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf
主节点上的

$ sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ export KUBECONFIG=$HOME/.kube/config
$ echo "export KUBECONFIG=$HOME/.kube/config" | tee -a ~/.bashrc

设置法兰绒用于网络连接:

$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
clusterrole "flannel" created
clusterrolebinding "flannel" created
serviceaccount "flannel" created
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created

$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
clusterrole "flannel" configured
clusterrolebinding "flannel" configured

将节点加入到每个节点上运行此节点的集群:

$ sudo kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 \
--discovery-token-ca-cert-hash sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf \
--ignore-preflight-errors=cri

在主服务器上安装了仪表板:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role "kubernetes-dashboard-minimal" created
rolebinding "kubernetes-dashboard-minimal" created
deployment "kubernetes-dashboard" created
service "kubernetes-dashboard" created

启动了代理:

$ kubectl proxy
Starting to serve on 127.0.0.1:8001

使用-L 8001:127.0.0.1:8001打开另一个ssh来打开并打开http://localhost:8001/ui的本地浏览器窗口

重定向到http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/并说:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "no endpoints available for service \"https:kubernetes-    dashboard:\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

检查豆荚......

$ sudo kubectl get pods --all-namespaces
NAMESPACE     NAME                                          READY     STATUS             RESTARTS   AGE
default       guids-74487d79cf-zsj8q                        1/1       Running            0          4h
kube-system   etcd-jenkins-kube-master                      1/1       Running            1          21h
kube-system   kube-apiserver-jenkins-kube-master            1/1       Running            1          21h
kube-system   kube-controller-manager-jenkins-kube-master   1/1       Running            2          21h
kube-system   kube-dns-6f4fd4bdf-7pr9q                      3/3       Running            0          1d
kube-system   kube-flannel-ds-pvk8m                         1/1       Running            0          4h
kube-system   kube-flannel-ds-q4fsl                         1/1       Running            0          4h
kube-system   kube-flannel-ds-qhxn6                         1/1       Running            0          21h
kube-system   kube-flannel-ds-tkspz                         1/1       Running            0          4h
kube-system   kube-flannel-ds-vgqsb                         1/1       Running            0          4h
kube-system   kube-proxy-7np9b                              1/1       Running            0          4h
kube-system   kube-proxy-9lx8h                              1/1       Running            1          1d
kube-system   kube-proxy-f46d8                              1/1       Running            0          4h
kube-system   kube-proxy-fdtx9                              1/1       Running            0          4h
kube-system   kube-proxy-kmnjf                              1/1       Running            0          4h
kube-system   kube-scheduler-jenkins-kube-master            1/1       Running            1          21h
kube-system   kubernetes-dashboard-5bd6f767c7-xf42n         0/1       CrashLoopBackOff   53         4h

检查日志......

$ sudo kubectl logs kubernetes-dashboard-5bd6f767c7-xf42n --namespace=kube-system
2018/03/20 17:56:25 Starting overwatch
2018/03/20 17:56:25 Using in-cluster config to connect to apiserver
2018/03/20 17:56:25 Using service account token for csrf signing
2018/03/20 17:56:25 No request provided. Skipping authorization
2018/03/20 17:56:55 Error while initializing connection to Kubernetes apiserver. 
This most likely means that the cluster is misconfigured (e.g., it has invalid
 apiserver certificates or service accounts configuration) or the 
--apiserver-host param points to a server that does not exist. 
Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

我觉得这个引用10.96.0.1比较奇怪。我在网络上没有那个我知道的地方。

我将sudo kubectl describe pod --namespace=kube-system的输出放在pastebin上: https://pastebin.com/cPppPkRw

提前感谢任何指示。

-Steve Maring

佛罗里达州奥兰多

1 个答案:

答案 0 :(得分:1)

  

--service-cluster-ip-range=10.96.0.0/12

你的pastebin的第76行显示了服务CIDR,它与kubernetes对世界的看法相符:服务CIDR中的.1始终是kubernetes(IIRC kube-dns获得相当低的IP任务也是如此,但我无法回想起它是否总是像kubernetes一样被修复了)

您希望更改服务和Pod CIDR以适应法兰绒创建的10.244.0.0/16 subnet作为部署该yaml的副作用,或更改其ConfigMap(错误)现在网络已被推入etcd)以与指定给apiserver的服务和Pod CIDR保持一致,这是您的危险。