我有 Elasticsearch 集群,由 EKS (StatefulSet
)上使用 Persistent Volumes 运行的3个节点(Server Version: v1.13.7-eks-c57ff8
) >。
我执行了从1.12
到1.13
的EKS集群升级,成功。但是Elasticsearch集群节点之一无法启动,并停留在init
状态:
NAME READY STATUS RESTARTS AGE
es-master-0 0/1 Init:0/3 0 15h
es-master-1 1/1 Running 0 44h
es-master-2 1/1 Running 0 44h
我试图杀死es-master-0
吊舱,但新吊舱再次卡在相同的状态。
当我检查Pod部署(kubectl describe pod es-master-0
)时,我发现Pod无法安装该卷:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m13s default-scheduler Successfully assigned kube-logging/es-master-0 to ip-10-2-18-16.us-west-2.compute.internal
Normal SuccessfulAttachVolume 2m10s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-f2e27430-af11-11e9-b10d-02a8eba067e2"
Warning FailedMount 10s kubelet, ip-10-2-18-16.us-west-2.compute.internal Unable to mount volumes for pod "es-master-0_kube-logging(bc27e29c-b539-11e9-9958-06eeabb0603e)": timeout expired waiting for volumes to attach or mount for pod "kube-logging"/"es-master-0". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-bz6w9]
kubectl get pv
的输出:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-06cd5cfe-af12-11e9-b10d-02a8eba067e2 100Gi RWO Retain Bound kube-logging/data-es-master-1 aws-gp2 7d19h
pvc-178b5aba-af12-11e9-b10d-02a8eba067e2 100Gi RWO Retain Bound kube-logging/data-es-master-2 aws-gp2 7d19h
pvc-f2e27430-af11-11e9-b10d-02a8eba067e2 100Gi RWO Retain Bound kube-logging/data-es-master-0 aws-gp2 7d19h
kubectl get pvc
的输出:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-es-master-0 Bound pvc-f2e27430-af11-11e9-b10d-02a8eba067e2 100Gi RWO aws-gp2 7d19h
data-es-master-1 Bound pvc-06cd5cfe-af12-11e9-b10d-02a8eba067e2 100Gi RWO aws-gp2 7d19h
data-es-master-2 Bound pvc-178b5aba-af12-11e9-b10d-02a8eba067e2 100Gi RWO aws-gp2 7d19h
我还尝试重新启动计划该Pod的节点。
这是我的清单文件:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-master
namespace: kube-logging
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
resources:
limits:
cpu: 1000m
memory: 2.5G
requests:
cpu: 100m
ports:
- containerPort: 9200
name: rest
protocol: TCP
- containerPort: 9300
name: inter-node
protocol: TCP
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: prod-eks-logs
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: node.name
value: "$(NODE_NAME).elasticsearch"
- name: discovery.zen.ping.unicast.hosts
value: "es-master-0.elasticsearch,es-master-1.elasticsearch,es-master-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "es-master-0.elasticsearch,es-master-1.elasticsearch,es-master-2.elasticsearch"
- name: discovery.zen.minimum_master_nodes
value: "2"
- name: ES_JAVA_OPTS
value: "-Xmx1g -Xmx1g"
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: aws-gp2
resources:
requests:
storage: 100Gi
有关如何通过这种Elasticsearch状态的任何帮助吗?