我是eks的新手。我使用此群集配置yaml文件创建新群集,
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: h2-dev-cluster
region: us-west-2
nodeGroups:
- name: h2-dev-ng-1
instanceType: t2.small
desiredCapacity: 2
ssh: # use existing EC2 key
publicKeyName: dev-eks-node
但是eksctl停留在
waiting for at least 1 node(s) to become ready in "h2-dev-ng-1
然后超时。
我已经检查了该AWS文档https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html中的所有要点
所有要点都正确排除了The ClusterName in your worker node AWS CloudFormation template
,因为UserData
已经通过cloudformation加密了,我无法检查。
我访问一个节点并输入journalctl -u kubelet
,然后发现这些错误
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.007677 4541 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.391913 4541 kubelet.go:2272] node "ip-192-168-53-151.us-west-2.compute.internal" not found
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.434158 4541 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.
Jul 03 08:22:31 ip-192-168-53-151.us-west-2.compute.internal kubelet[4541]: E0703 08:22:31.492746 4541 kubelet.go:2272] node "ip-192-168-53-151.us-west-2.compute.internal" not found
然后我输入cat /var/lib/kubelet/kubeconfig
,我看到了
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /etc/kubernetes/pki/ca.crt
server: MASTER_ENDPOINT
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubelet
name: kubelet
current-context: kubelet
users:
- name: kubelet
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: /usr/bin/aws-iam-authenticator
args:
- "token"
- "-i"
- "CLUSTER_NAME"
- --region
- "AWS_REGION"
我注意到服务器的参数是MASTER_ENDPINT
。因此,我运行/etc/eks/bootstrap.sh h2-dev-cluster
设置集群名称。找到该参数,使它跟着跟随(我标记了网址)
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /etc/kubernetes/pki/ca.crt
server: https://XXXXXXXX.gr7.us-west-2.eks.amazonaws.com
name: kubernetes
运行sudo service restart kubectl
,但是journalctl -u kubelet
仍然可以找到相同的错误,并且节点仍然无法加入群集
我该如何解决?
eksctl: 0.23.0 rc1 (also test with 0.20.0 has the same error)
kubectl: 1.18.5
os: ubuntu 18.04 (use a new ec2 )