我正在尝试使用 kops 在 aws 上创建一个非常简单的集群,其中包含一个主节点和 2 个工作节点。但是创建后,kops validate cluster 提示集群不健康。
创建的集群:
kops create cluster --name=mycluster --zones=ap-south-1a --master-size="t2.micro" --node-size="t2.micro" --node-count="2" --cloud aws --ssh-public-key="~/.ssh/id_rsa.pub"
Output from kops validate cluster:
VALIDATION ERRORS
KIND NAME MESSAGE
Pod kube-system/kops-controller-xxxtk system-node-critical pod "kops-controller-xxxtk" is not ready (kops-controller)
Pod kube-system/kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal system-cluster-critical pod "kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal" is not ready (kube-controller-manager)
Validation Failed
Validation failed: cluster not yet healthy
获取 kube-system 命名空间中的资源显示:
NAME READY STATUS RESTARTS AGE
pod/dns-controller-8d8889c4b-rwnkd 1/1 Running 0 47m
pod/etcd-manager-events-ip-xxx-xxx-xxx-xxx..ap-south-1.compute.internal 1/1 Running 0 72m
pod/etcd-manager-main-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 72m
pod/kops-controller-xxxtk 1/1 Running 11 70m
pod/kube-apiserver-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 2/2 Running 1 72m
pod/kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 0/1 CrashLoopBackOff 15 72m
pod/kube-dns-696cb84c7-qzqf2 3/3 Running 0 16h
pod/kube-dns-696cb84c7-tt7ng 3/3 Running 0 16h
pod/kube-dns-autoscaler-55f8f75459-7jbjb 1/1 Running 0 16h
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 16h
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 72m
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 16h
pod/kube-scheduler-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 15 72m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 100.64.0.10 <none> 53/UDP,53/TCP 16h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kops-controller 1 1 1 1 1 kops.k8s.io/kops-controller-pki=,node-role.kubernetes.io/master= 16h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dns-controller 1/1 1 1 16h
deployment.apps/kube-dns 2/2 2 2 16h
deployment.apps/kube-dns-autoscaler 1/1 1 1 16h
NAME DESIRED CURRENT READY AGE
replicaset.apps/dns-controller-8d8889c4b 1 1 1 16h
replicaset.apps/kube-dns-696cb84c7 2 2 2 16h
replicaset.apps/kube-dns-autoscaler-55f8f75459 1 1 1 16h
从 kube-scheduler 获取日志显示:
I0211 04:26:45.546427 1 flags.go:59] FLAG: --vmodule=""
I0211 04:26:45.546442 1 flags.go:59] FLAG: --write-config-to=""
I0211 04:26:46.306497 1 serving.go:331] Generated self-signed cert in-memory
W0211 04:26:47.736258 1 authentication.go:368] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0211 04:26:47.765649 1 authentication.go:265] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0211 04:26:47.783852 1 authentication.go:289] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0211 04:26:47.798838 1 authorization.go:187] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0211 04:26:47.831825 1 authorization.go:156] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0211 04:26:55.344064 1 factory.go:210] Creating scheduler from algorithm provider 'DefaultProvider'
I0211 04:26:55.370766 1 registry.go:173] Registering SelectorSpread plugin
I0211 04:26:55.370802 1 registry.go:173] Registering SelectorSpread plugin
I0211 04:26:55.504324 1 server.go:146] Starting Kubernetes Scheduler version v1.19.7
W0211 04:26:55.607516 1 authorization.go:47] Authorization is disabled
W0211 04:26:55.607537 1 authentication.go:40] Authentication is disabled
I0211 04:26:55.618714 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0211 04:26:55.741863 1 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1613017606" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1613017605" (2021-02-11 03:26:45 +0000 UTC to 2022-02-11 03:26:45 +0000 UTC (now=2021-02-11 04:26:55.741788572 +0000 UTC))
I0211 04:26:55.746888 1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1613017607" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1613017607" (2021-02-11 03:26:46 +0000 UTC to 2022-02-11 03:26:46 +0000 UTC (now=2021-02-11 04:26:55.7468713 +0000 UTC))
I0211 04:26:55.757881 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0211 04:26:55.771581 1 secure_serving.go:197] Serving securely on [::]:10259
I0211 04:26:55.793134 1 reflector.go:207] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.815641 1 reflector.go:207] Starting reflector *v1.CSINode (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.841309 1 reflector.go:207] Starting reflector *v1beta1.PodDisruptionBudget (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.857460 1 reflector.go:207] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.875096 1 reflector.go:207] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.894283 1 reflector.go:207] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.894615 1 reflector.go:207] Starting reflector *v1.PersistentVolume (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.895000 1 reflector.go:207] Starting reflector *v1.ReplicationController (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.895250 1 reflector.go:207] Starting reflector *v1.ReplicaSet (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.902323 1 reflector.go:207] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.902572 1 reflector.go:207] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.905927 1 reflector.go:207] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188
I0211 04:26:56.355570 1 node_tree.go:86] Added node "ip-172-20-43-190.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.357441 1 node_tree.go:86] Added node "ip-172-20-63-116.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.357578 1 node_tree.go:86] Added node "ip-172-20-60-103.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.377402 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...
I0211 04:27:12.368681 1 leaderelection.go:253] successfully acquired lease kube-system/kube-scheduler
I0211 04:27:12.436915 1 scheduler.go:597] "Successfully bound pod to node" pod="default/nginx-deployment-66b6c48dd5-w4hb5" node="ip-172-20-63-116.ap-south-1.compute.internal" evaluatedNodes=3 feasibleNodes=2
I0211 04:27:12.451792 1 scheduler.go:597] "Successfully bound pod to node" pod="default/nginx-deployment-66b6c48dd5-4xz8l" node="ip-172-20-43-190.ap-south-1.compute.internal" evaluatedNodes=3 feasibleNodes=2
E0211 04:32:20.487059 1 leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Get "https://127.0.0.1/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s": context deadline exceeded
I0211 04:32:20.633059 1 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F0211 04:32:20.673521 1 server.go:199] leaderelection lost
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc0005c2d01, 0xc000900800, 0x41, 0x1fd)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
....
... stack trace from go runtime
答案 0 :(得分:1)
我没有发现您运行的命令有什么特别的错误。但是,t2.micro 非常小,对于集群来说可能太小了。
您可以查看 kops-operator 日志为什么它没有启动。试试 kubectl logs kops-controller-xxxx
和 kubectl describe pod kops-controller-xxx
答案 1 :(得分:1)
你知道,在@Markus 和你的评论之后,我开始深入挖掘信息,这就是我发现的。
第一篇 Running Kubernetes on AWS T2 Instances 文章。使用 T2.medium 的示例,其中包含非常详细的步骤和时间表,描述了那里发生的事情。
最后的结论:
<块引用>我们已经证明了 Kubernetes 上的部署的不可预测性 集群不适合 T2/3 系列实例。有 由于 Pod 消耗大量资源,可能会限制实例 资源量。这充其量会限制您的性能 应用程序和最坏的情况可能会导致集群失败(如果使用 由于 ETCD 问题,主节点的 T2/3s)。此外,这 仅当我们正在监控 CloudWatch 时才会获取条件 仔细或对应用程序进行性能监控 豆荚。
为此,建议避免使用 T2/3 实例类型系列 对于 Kubernetes 部署,如果您想在节省资金的同时 使用更传统的实例系列(例如 Ms 和 Rs)然后采用 查看我们关于 Spot 实例的博客。
在官方信息旁边:
1) t2.micro 规格: T2.micro 是 1 vCPU 和 1 gb mem
t2.micro specs:
2) Kubernetes 一般所需的最少内存和 CPU(内核):
Master 节点最低需要 2GB 内存,Worker 节点最低需要 1GB 内存
主节点至少需要 1.5 个,工作节点至少需要 0.7 个核心。
资源不足。大师请使用最低T2.medium