K8S:节点未加入群集

时间:2015-09-17 10:05:30

标签: kubernetes coreos

我的kubernetes节点出现了问题,该节点没有注册到kubernetes master。

我已经看到很多与我的问题相对应的问题,但大多数是已经纠正的错误。 kubernetes的先决条件和不同组成部分似乎是可操作的。 我当然有一个糟糕的配置,但尝试一些有效的方法并不适合我。

我正在关注CoreOS团队的Step by Step tutorial

我的配置:

  • VirtualBox:5.0.2
  • Vagrant:1.7.3
  • CoreOS:801
  • Hyperkube:1.0.6

我的程序:

- I boot a kubernetes master
  - start etcd
  - start flanneld
  - start docker after flanneld
  - start kubelet
    - it start apiserver (as a container)
    - it start controller-manager (as a container)
    - it start scheduler (as a container)
    - it start proxy (as a container)

- I start a kubernetes node
  - start etcd
  - start flanneld
  - start docker after flanneld
  - start the kubelet

ETCD2:

  • 我可以在master和node之间共享值。
  • 该节点是一个etcd代理

FLANNELD:

  • 我在每一侧都创建了一个容器
  • 我可以从一个ping到另一个

MASTER KUBELET:

  • 启动配置文件夹
  • 中的组件

KUBERNETES SERES RUN:

  • 如果我将主kubelet register-node转为true,则将其注册为节点
  • 在测试中,主kubelet未转为注册
  • 我可以启动pod(如果主kubelet切换到注册,则可以正常工作)

NODE KUBELET:

以下是日志:

$ journalctl -fu kubelet --since=2012-01-01
-- Logs begin at Thu 2015-09-17 09:38:17 UTC. --
Sep 17 09:39:37 node1 systemd[1]: Starting Kubernetes Kubelet for Node...
Sep 17 09:39:37 node1 systemd[1]: Started Kubernetes Kubelet for Node.
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.080731    1634 manager.go:127] cAdvisor running in container: "/system.slice/kubelet.service"
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.081391    1634 fs.go:93] Filesystem partitions: map[/dev/sda9:{mountpoint:/ major:8 minor:9} /dev/sda3:{mountpoint:/usr major:8 minor:3} /dev/sda6:{mountpoint:/usr/share/oem major:8 minor:6}]
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.083078    1634 manager.go:156] Machine: {NumCores:1 CpuFrequency:3403222 MemoryCapacity:4048441344 MachineID:1c0a9b68c0044cfdb5024dc80a5cdec2 SystemUUID:35A45175-4822-4FFA-9CBF-ECC10430ED28 BootID:18baf9ac-73a9-42f3-9bc5-2dca985d03e9 Filesystems:[{Device:/dev/sda6 Capacity:113229824} {Device:/dev/sda9 Capacity:16718393344} {Device:/dev/sda3 Capacity:1031946240}] DiskMap:map[8:0:{Name:sda Major:8 Minor:0 Size:19818086400 Scheduler:cfq}] NetworkDevices:[{Name:eth0 MacAddress:08:00:27:8c:0a:cd Speed:0 Mtu:1500} {Name:eth1 MacAddress:08:00:27:bc:e6:70 Speed:0 Mtu:1500} {Name:eth2 MacAddress:08:00:27:b9:33:63 Speed:0 Mtu:1500} {Name:flannel0 MacAddress: Speed:10 Mtu:1472}] Topology:[{Id:0 Memory:4048441344 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:6291456 Type:Unified Level:3}]}]}
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.087467    1634 manager.go:163] Version: {KernelVersion:4.1.6-coreos-r2 ContainerOsVersion:CoreOS 801.0.0 DockerVersion:1.8.1 CadvisorVersion:0.15.1}
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.087674    1634 plugins.go:69] No cloud provider specified.
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.087698    1634 docker.go:295] Connecting to docker on unix:///var/run/docker.sock
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.088720    1634 server.go:663] Adding manifest file: /etc/kubernetes/manifests
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.088734    1634 server.go:673] Watching apiserver
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.110463    1634 reflector.go:136] Failed to list *api.Node: Get http://192.168.1.88:8080/api/v1/nodes?fieldSelector=metadata.name%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.111317    1634 reflector.go:136] Failed to list *api.Service: Get http://192.168.1.88:8080/api/v1/services: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.111641    1634 reflector.go:136] Failed to list *api.Pod: Get http://192.168.1.88:8080/api/v1/pods?fieldSelector=spec.nodeName%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.219264    1634 plugins.go:56] Registering credential provider: .dockercfg
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.221429    1634 server.go:635] Started kubelet
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.221752    1634 kubelet.go:682] Image garbage collection failed: unable to find data for container /
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.230631    1634 kubelet.go:702] Running in container "/kubelet"
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.235396    1634 server.go:63] Starting to listen on 0.0.0.0:10250
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.257384    1634 event.go:194] Unable to write event: 'Post http://192.168.1.88:8080/api/v1/namespaces/default/events: dial tcp 192.168.1.88:8080: connection refused' (may retry after sleeping)
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.368996    1634 factory.go:226] System is using systemd
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.369627    1634 factory.go:234] Registering Docker factory
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.370640    1634 factory.go:89] Registering Raw factory
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.490377    1634 manager.go:946] Started watching for new ooms in manager
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.490733    1634 oomparser.go:183] oomparser using systemd
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.491323    1634 manager.go:243] Starting recovery of all containers
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.647835    1634 manager.go:248] Recovery completed
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.702130    1634 status_manager.go:76] Starting to sync pod status with apiserver
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.702375    1634 kubelet.go:1725] Starting kubelet main sync loop.
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.712658    1634 kubelet.go:1641] error getting node: node 192.168.1.31 not found
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.736035    1634 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Sep 17 09:39:37 node1 kubelet[1634]: W0917 09:39:37.743037    1634 status_manager.go:80] Failed to updated pod status: error updating status for pod "kube-proxy-192.168.1.31_default": Get http://192.168.1.88:8080/api/v1/namespaces/default/pods/kube-proxy-192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:38 node1 kubelet[1634]: E0917 09:39:38.113116    1634 reflector.go:136] Failed to list *api.Pod: Get http://192.168.1.88:8080/api/v1/pods?fieldSelector=spec.nodeName%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:38 node1 kubelet[1634]: E0917 09:39:38.113170    1634 reflector.go:136] Failed to list *api.Service: Get http://192.168.1.88:8080/api/v1/services: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:38 node1 kubelet[1634]: E0917 09:39:38.113191    1634 reflector.go:136] Failed to list *api.Node: Get http://192.168.1.88:8080/api/v1/nodes?fieldSelector=metadata.name%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:39 node1 kubelet[1634]: E0917 09:39:39.114141    1634 reflector.go:136] Failed to list *api.Node: Get http://192.168.1.88:8080/api/v1/nodes?fieldSelector=metadata.name%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:39 node1 kubelet[1634]: E0917 09:39:39.114207    1634 reflector.go:136] Failed to list *api.Service: Get http://192.168.1.88:8080/api/v1/services: dial tcp 192.168.1.88:8080: connection refused

有很多类似的消息: 192.168.1.88:8080:连接被拒绝

当我查看注册节点时:

$ kubectl get nodes
NAME      LABELS    STATUS

对我来说,apiserver凭据配置不正确,因为本地kubelet可以注册,但远程不能。

所以这是我的apiserver配置:

 $ cat /etc/kubernetes/manifests/kube-apiserver.yml
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver

spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: gcr.io/google_containers/hyperkube:v1.0.6
    command:
    - /hyperkube
    - apiserver
    - --bind-address=0.0.0.0
    - --etcd_servers=http://192.168.1.88:2379
    - --allow-privileged=true
    - --service-cluster-ip-range=10.3.0.0/24
    - --secure_port=443
    - --advertise-address=192.168.1.88
    - --admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
    - --tls-cert-file=/etc/kubernetes/ssl/apiserver.pem
    - --tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --client-ca-file=/etc/kubernetes/ssl/ca.pem
    - --service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --cloud-provider=
    ports:
    - containerPort: 443
      hostPort: 443
      name: https
    - containerPort: 7080
      hostPort: 7080
      name: http
    - containerPort: 8080
      hostPort: 8080
      name: local
    volumeMounts:
    - mountPath: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-host
      readOnly: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/ssl
    name: ssl-certs-kubernetes
  - hostPath:
      path: /usr/share/ca-certificates
    name: ssl-certs-host

证书存在:

core@master1 ~ $ ls -l /etc/kubernetes/ssl/
total 40
-rw-r--r-- 1 core core 1675 Sep 17 09:31 apiserver-key.pem
-rw-r--r-- 1 core core 1099 Sep 17 09:31 apiserver.pem
-rw-r--r-- 1 core core 1090 Sep 17 09:31 ca.pem

来自apiserver的日志:

I0917 09:33:48.692147       1 plugins.go:69] No cloud provider specified.
I0917 09:33:49.049701       1 master.go:273] Node port range unspecified. Defaulting to 30000-32767.
E0917 09:33:49.080829       1 reflector.go:136] Failed to list *api.ResourceQuota: Get http://127.0.0.1:8080/api/v1/resourcequotas: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.080955       1 reflector.go:136] Failed to list *api.Secret: Get http://127.0.0.1:8080/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081032       1 reflector.go:136] Failed to list *api.ServiceAccount: Get http://127.0.0.1:8080/api/v1/serviceaccounts: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081075       1 reflector.go:136] Failed to list *api.LimitRange: Get http://127.0.0.1:8080/api/v1/limitranges: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081141       1 reflector.go:136] Failed to list *api.Namespace: Get http://127.0.0.1:8080/api/v1/namespaces: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081186       1 reflector.go:136] Failed to list *api.Namespace: Get http://127.0.0.1:8080/api/v1/namespaces: dial tcp 127.0.0.1:8080: connection refused
[restful] 2015/09/17 09:33:49 log.go:30: [restful/swagger] listing is available at https://192.168.1.88:443/swaggerapi/
[restful] 2015/09/17 09:33:49 log.go:30: [restful/swagger] https://192.168.1.88:443/swaggerui/ is mapped to folder /swagger-ui/
W0917 09:33:49.132239       1 controller.go:212] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes  default    0 0001-01-01 00:00:00 +0000 UTC <nil> map[] map[]} [{[{192.168.1.88 <nil>}] [{ 443 TCP}]}]}
I0917 09:33:49.148355       1 server.go:441] Serving securely on 0.0.0.0:443
I0917 09:33:49.148404       1 server.go:483] Serving insecurely on 127.0.0.1:8080

1 个答案:

答案 0 :(得分:3)

根据您的apiserver日志的最后两行,它正在侦听端口443上的0.0.0.0(所有接口)和端口8080上的127.0.0.1(localhost)。

从您的kubelet的日志输出中,您可以尝试访问192.168.1.88:8080(它没有收听)的apiserver。

对于远程kubelet,他们应该使用“https://192.168.1.88”(通过端口443的公共接口)连接到api服务器。

根据您的TLS配置,您可能还需要为使​​用正确TLS证书的kubelet配置kubeconfig,其中包含:https://coreos.com/kubernetes/docs/latest/deploy-workers.html#set-up-kubeconfig