我按照https://coreos.com/kubernetes/docs/latest/deploy-addons.html的说明通过创建kubedns rc和服务手动安装KubeDNS。
yaml如下:
apiVersion: v1
kind: Service metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS" spec: selector:
k8s-app: kube-dns clusterIP: ${DNS_SERVICE_IP} ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
---
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true" spec: replicas: 1 selector:
k8s-app: kube-dns
version: v11 template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default
但是,在创建资源检查后,状态将返回:
[root@sc-master-1 pods]# kubectl get pods --namespace=kube-system NAME READY STATUS RESTARTS AGE kube-dns-v11-fyug1 2/4 CrashLoopBackOff 2 36s
我把它缩小到了一个错误:
[root@sc-client-2 jonathan]# docker logs bf0466c30b1d 2016-05-15 08:39:26.124819 I | etcdmain: etcd Version: 2.2.1 2016-05-15
08:39:26.124851 I | etcdmain: Git SHA: 75f8282 2016-05-15
08:39:26.124857 I | etcdmain: Go Version: go1.5.1 2016-05-15 08:39:26.124860 I | etcdmain: Go OS/Arch: linux/amd64 2016-05-15 08:39:26.135982 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1 2016-05-15 08:39:26.136562 I | etcdmain: listening for peers on http://localhost:2380 2016-05-15 08:39:26.136704 I | etcdmain: listening for peers on http://localhost:7001 2016-05-15
08:39:26.136746 I | etcdmain: listening for client requests on http://127.0.0.1:2379 2016-05-15 08:39:26.136814 I | etcdmain: listening for client requests on http://127.0.0.1:4001 2016-05-15 08:39:26.136931 I | etcdmain: stopping listening for client requests on http://127.0.0.1:4001 2016-05-15 08:39:26.136943 I | etcdmain: stopping listening for client requests on http://127.0.0.1:2379 2016-05-15
08:39:26.136951 I | etcdmain: stopping listening for peers on http://localhost:7001 2016-05-15 08:39:26.136957 I | etcdmain: stopping listening for peers on http://localhost:2380 2016-05-15 08:39:26.136967 C | etcdmain: mkdir /var/etcd/data/member: permission denied
我不知道为什么这不起作用,我不能执行容器来手动创建文件夹,因为它不会停留。
更新
用于:
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data:z
从Do not have permission to write to emptyDir获取过去的etcd错误,但我仍然无法获得dns服务,下面是kube-dns pod中容器的相关日志
[root@sc-master-1 pods]# kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-v11-0bkop 3/4 Running 17 20m
skydns的日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop skydns --namespace=kube-system
2016/05/18 17:01:11 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [3]
2016/05/18 17:01:11 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/05/18 17:01:11 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
日志为kube2sky:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop kube2sky --namespace=kube-system
I0518 17:02:17.693959 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0518 17:02:18.697702 1 kube2sky.go:529] Using http://localhost:8080 for kubernetes master
I0518 17:02:18.698071 1 kube2sky.go:530] Using kubernetes API <nil>
I0518 17:02:18.698199 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0518 17:02:18.701663 1 kube2sky.go:604] Ignoring error while waiting for service default/kubernetes: yaml: mapping values are not allowed in this context. Sleeping 1s before retrying.
健康日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop healthz --namespace=kube-system
2016/05/18 17:01:17 Worker running nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/05/18 17:02:12 Client ip 172.17.0.1:35440 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/05/18 17:03:22 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2016-05-18 17:02:22.691812173 +0000 UTC, error exit status 1
2016/05/18 17:03:22 Client ip 172.17.0.1:35475 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
etcd的日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop etcd --namespace=kube-system
2016-05-18 17:01:02.478791 I | etcdmain: etcd Version: 2.2.1
2016-05-18 17:01:02.478825 I | etcdmain: Git SHA: 75f8282
2016-05-18 17:01:02.478831 I | etcdmain: Go Version: go1.5.1
2016-05-18 17:01:02.478846 I | etcdmain: Go OS/Arch: linux/amd64
2016-05-18 17:01:02.478851 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-05-18 17:01:02.485798 I | etcdmain: listening for peers on http://localhost:2380
2016-05-18 17:01:02.485931 I | etcdmain: listening for peers on http://localhost:7001
2016-05-18 17:01:02.485984 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-05-18 17:01:02.486070 I | etcdmain: listening for client requests on http://127.0.0.1:4001
2016-05-18 17:01:02.486300 I | etcdserver: name = default
2016-05-18 17:01:02.486324 I | etcdserver: data dir = /var/etcd/data
2016-05-18 17:01:02.486329 I | etcdserver: member dir = /var/etcd/data/member
2016-05-18 17:01:02.486333 I | etcdserver: heartbeat = 100ms
2016-05-18 17:01:02.486337 I | etcdserver: election = 1000ms
2016-05-18 17:01:02.486341 I | etcdserver: snapshot count = 10000
2016-05-18 17:01:02.486350 I | etcdserver: advertise client URLs = http://127.0.0.1:2379,http://127.0.0.1:4001
2016-05-18 17:01:02.486356 I | etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
2016-05-18 17:01:02.486365 I | etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
2016-05-18 17:01:02.523097 I | etcdserver: starting member 6a5871dbdd12c17c in cluster f68652439e3f8f2a
2016-05-18 17:01:02.523157 I | raft: 6a5871dbdd12c17c became follower at term 0
2016-05-18 17:01:02.523192 I | raft: newRaft 6a5871dbdd12c17c [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-05-18 17:01:02.523198 I | raft: 6a5871dbdd12c17c became follower at term 1
2016-05-18 17:01:02.523329 I | etcdserver: starting server... [version: 2.2.1, cluster version: to_be_decided]
2016-05-18 17:01:02.524093 N | etcdserver: added local member 6a5871dbdd12c17c [http://localhost:2380 http://localhost:7001] to cluster f68652439e3f8f2a
2016-05-18 17:01:03.323562 I | raft: 6a5871dbdd12c17c is starting a new election at term 1
2016-05-18 17:01:03.323722 I | raft: 6a5871dbdd12c17c became candidate at term 2
2016-05-18 17:01:03.323739 I | raft: 6a5871dbdd12c17c received vote from 6a5871dbdd12c17c at term 2
2016-05-18 17:01:03.323776 I | raft: 6a5871dbdd12c17c became leader at term 2
2016-05-18 17:01:03.323787 I | raft: raft.node: 6a5871dbdd12c17c elected leader 6a5871dbdd12c17c at term 2
2016-05-18 17:01:03.324154 I | etcdserver: setting up the initial cluster version to 2.2
2016-05-18 17:01:03.324251 I | etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379 http://127.0.0.1:4001]} to cluster f68652439e3f8f2a
2016-05-18 17:01:03.473271 N | etcdserver: set the initial cluster version to 2.2
更新
能够通过在上面的dns-addon.yml中的env变量中对master进行硬编码来提前一步
现在我明白了:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-sgb1r -c kube2sky --namespace=kube-system
I0518 18:08:58.837758 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0518 18:08:59.839548 1 kube2sky.go:529] Using master for kubernetes master
I0518 18:08:59.839565 1 kube2sky.go:530] Using kubernetes API <nil>
I0518 18:08:59.839676 1 kube2sky.go:598] Waiting for service: default/kubernetes
更新 好的,我能够使用fqdn,即http://:8080而不是:8080。
我可以使用busybox pod并运行:
[root@sc-master-1 jonathan]# kubectl exec busybox -- nslookup kubernetes.default.svc.cluster.local 10.254.0.2
Server: 10.254.0.2
Address 1: 10.254.0.2
Name: kubernetes.default.svc.cluster.local
Address 1: 10.254.0.1
这有效,但是我注意到了一些奇怪的行为,只要我在运行时从pod运行它,dns就会工作:
[root@sc-master-1 jonathan]# nslookup kubernetes.default.svc.cluster.local 10.254.0.2
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
我收到上述错误,但是在已安排kube dns的节点上运行相同的命令:
[jonathan@sc-client-2 ~]$ nslookup kubernetes.default.svc.cluster.local 10.254.0.2
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kubernetes.default.svc.cluster.local
Address: 10.254.0.1
从集群中的另一个节点进行测试我得到与master相同的错误:
[root@sc-client-1 jonathan]# nslookup kubernetes.default.svc.cluster.local 10.254.0.2
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
可能出现什么问题?