Kubernetes上用于Elasticsearch 6.3.0的就绪和活跃性探针失败

时间:2019-09-13 07:00:17

标签: elasticsearch kubernetes azure-aks efk

我正在尝试在Kubernetes上设置EFK堆栈。使用的Elasticsearch版本为6.3.2。一切正常,直到我将探针配置放在部署YAML文件中。我收到如下错误。这导致吊舱被宣布为不正常运行,并最终被重新启动,这似乎是错误的重新启动。

警告不健康的15s kubelet,aks-agentpool-23337112-0活动探针失败:获取http://10.XXX.Y.ZZZ:9200/_cluster/health:拨打tcp 10.XXX.Y.ZZZ:9200:connect:连接被拒绝

我确实尝试过使用telnet从另一个容器到具有IP和端口的Elasticsearch Pod,但我成功了,但是只有节点上的kubelet无法解析Pod的IP,导致探测失败。

以下是Kubernetes Statefulset YAML的pod规范的摘录。对决议的任何帮助将非常有帮助。花了很多时间没有任何线索:(

PS:正在AKS集群上设置堆栈

      - name: es-data
        image: quay.io/pires/docker-elasticsearch-kubernetes:6.3.2
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CLUSTER_NAME
          value: myesdb
        - name: NODE_MASTER
          value: "false"
        - name: NODE_INGEST
          value: "false"
        - name: HTTP_ENABLE
          value: "true"
        - name: NODE_DATA
          value: "true"
        - name: DISCOVERY_SERVICE
          value: "elasticsearch-discovery"
        - name: NETWORK_HOST
          value: "_eth0:ipv4_"          
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
        - name: PROCESSORS
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu
        resources:
          requests:
            cpu: 0.25
          limits:
            cpu: 1
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        livenessProbe:
          httpGet:
            port: http
            path: /_cluster/health
          initialDelaySeconds: 40
          periodSeconds: 10
       readinessProbe:
         httpGet:
           path: /_cluster/health
           port: http
         initialDelaySeconds: 30
         timeoutSeconds: 10 

在没有放置探针的情况下,豆荚/容器运行良好。可以预期的是,在部署YAML上进行设置时,探针应该可以正常工作,并且不应重新启动POD。

3 个答案:

答案 0 :(得分:4)

首先,请使用

查看日志
kubectl logs <pod name> -n <namespacename>

您必须首先运行init容器并更改卷权限。

在弹性搜索容器启动之前,您还必须以user : 1000运行整个配置,您必须使用init容器更改音量许可。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app : elasticsearch
    component: elasticsearch
    release: elasticsearch
  name: elasticsearch
spec:
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app : elasticsearch
      component: elasticsearch
      release: elasticsearch
  serviceName: elasticsearch
  template:
    metadata:
      creationTimestamp: null
      labels:
        app : elasticsearch
        component: elasticsearch
        release: elasticsearch
    spec:
      containers:
      - env:
        - name: cluster.name
          value: <SET THIS>
        - name: discovery.type
          value: single-node
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
        - name: bootstrap.memory_lock
          value: "false"
        image: elasticsearch:6.5.0
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
        - containerPort: 9200
          name: http
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        resources:
          limits:
            cpu: 250m
            memory: 1Gi
          requests:
            cpu: 150m
            memory: 512Mi
        securityContext:
          privileged: true
          runAsUser: 1000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-data
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - chown -R 1000:1000 /usr/share/elasticsearch/data
        - sysctl -w vm.max_map_count=262144
        - chmod 777 /usr/share/elasticsearch/data
        - chomod 777 /usr/share/elasticsearch/data/node
        - chmod g+rwx /usr/share/elasticsearch/data
        - chgrp 1000 /usr/share/elasticsearch/data
        image: busybox:1.29.2
        imagePullPolicy: IfNotPresent
        name: set-dir-owner
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-data
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 10
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: elasticsearch-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi

查看我的yaml配置,即可使用。这是用于Elasticsearch的单个节点

答案 1 :(得分:3)

问题是ElasticSearch本身具有健康状态(红色,黄色,绿色),您需要在配置中考虑这一点。

这是我根据the official ES helm chart在自己的ES配置中找到的内容:

VStack(spacing: 40) {
    Text("Image Border").font(.largeTitle)
    Text("Using cornerRadius & overlay").font(.title).foregroundColor(.gray)
    Text("Using cornerRadius will also clip the image. Then overlay a border.")
        .frame(minWidth: 0, maxWidth: .infinity)
        .font(.title).padding()
        .background(Color.orange)
        .layoutPriority(1)
        .foregroundColor(.black)

    Image("profile")
        .cornerRadius(10)
        .overlay(RoundedRectangle(cornerRadius: 10)
            .stroke(Color.orange, lineWidth: 4))
        .shadow(radius: 10)
}

答案 2 :(得分:0)

我的答案中概述的探针在出现Istio时可用于3个节点的发现。如果livenessProbe不好,那么k8s将重新启动容器,即使不允许正确启动也是如此。我使用内部弹性端口(用于节点到节点通信)测试活动性。这些端口使用TCP。

      livenessProbe:
        tcpSocket:
          port: 9300
        initialDelaySeconds: 60 # it takes time from jvm process to start start up to point when discovery process starts
        timeoutSeconds: 10

          - name: discovery.zen.minimum_master_nodes
          value: "2"
          - name: discovery.zen.ping.unicast.hosts
          value: elastic