RabbitMQ节点无法相互发现并加入群集

时间:2018-05-01 14:08:38

标签: kubernetes rabbitmq rabbitmqctl

我是RabbitMQ的新手并尝试使用状态集来设置高度可用的队列。我遵循的教程是here

将状态集和服务部署到kubernetes后, 节点无法在群集中发现彼此,并且pod转到状态: CrashLoopBackOff。似乎Peer Discovery未按预期工作,并且节点无法加入群集。 / p>

我的群集节点是 兔子@ rabbitmq-0,兔子@ rabbitmq-1和兔子@ rabbitmq-2

$ kubectl exec -it rabbitmq-0 / bin / sh

/ # rabbitmqctl status
Status of node 'rabbit@rabbitmq-0'
Error: unable to connect to node 'rabbit@rabbitmq-0': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@rabbitmq-0']

rabbit@rabbitmq-0:
  * connected to epmd (port 4369) on rabbitmq-0
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on rabbitmq-0
  * suggestion: start the node

current node details:
- node name: 'rabbitmq-cli-22@rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: 5X3n5Gy+r4FL+M53FHwv3w==

rabbitmq.conf

 { rabbit, [
  { loopback_users, [ ] },
  { tcp_listeners, [ 5672 ] },
  { ssl_listeners, [ ] },
  { hipe_compile, false },
  { cluster_nodes, { [ rabbit@rabbitmq-0, rabbit@rabbitmq-1, rabbit@rabbitmq-2], disc } },
  {ssl_listeners, [5671]},
  {ssl_options, [{cacertfile,"/etc/rabbitmq/ca_certificate.pem"},
    {certfile,"/etc/rabbitmq/server_certificate.pem"},
    {keyfile,"/etc/rabbitmq/server_key.pem"},
    {verify,verify_peer},
    {versions, ['tlsv1.2', 'tlsv1.1']}
    {fail_if_no_peer_cert,false}]}
] },
  { rabbitmq_management, [ { listener, [
  { port, 15672 },
  { ssl, false }
] } ] }
].

$ kubectl get statefulset rabbitmq

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: rabbitmq
  name: rabbitmq
  namespace: development
  resourceVersion: "119265565"
  selfLink: /apis/apps/v1/namespaces/development/statefulsets/rabbitmq
  uid: 10c2fabc-cbb3-11e7-8821-00505695519e
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: rabbitmq
  serviceName: rabbitmq
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rabbitmq
    spec:
      containers:
      - env:
        - name: RABBITMQ_ERLANG_COOKIE
          valueFrom:
            secretKeyRef:
              key: rabbitmq-erlang-cookie
              name: rabbitmq-erlang-cookie
        image: rabbitmq:1.0
        imagePullPolicy: IfNotPresent
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
                  sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
                  cat /etc/resolv.conf.new > /etc/resolv.conf;
                  rm /etc/resolv.conf.new;
                fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
                  rabbitmqctl stop_app;
                  rabbitmqctl join_cluster rabbit@rabbitmq-0;
                  rabbitmqctl start_app;
                fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
        name: rabbitmq
        ports:
        - containerPort: 5672
          protocol: TCP
        - containerPort: 5671
          protocol: TCP
        - containerPort: 15672
          protocol: TCP
        - containerPort: 25672
          protocol: TCP
        - containerPort: 4369
          protocol: TCP
        resources:
          limits:
            cpu: 400m
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 1Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/rabbitmq
          name: rabbitmq-persistent-data-storage
        - mountPath: /etc/rabbitmq
          name: rabbitmq-config
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 10
      volumes:
      - name: rabbitmq-config
        secret:
          defaultMode: 420
          secretName: rabbitmq-config
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: rabbitmq-persistent-data-storage
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    status:
      phase: Pending
status:
  currentReplicas: 1
  currentRevision: rabbitmq-4234207235
  observedGeneration: 1
  replicas: 1
  updateRevision: rabbitmq-4234207235

$ kubectl获取服务rabbitmq

apiVersion: v1
kind: Service
metadata:
  labels:
    app: rabbitmq
  name: rabbitmq
  namespace: develop
  resourceVersion: "59968950"
  selfLink: /api/v1/namespaces/develop/services/rabbitmq
  uid: ced85a60-cbae-11e7-8821-00505695519e
spec:
  clusterIP: None
  ports:
  - name: tls-amqp
    port: 5671
    protocol: TCP
    targetPort: 5671
  - name: management
    port: 15672
    protocol: TCP
    targetPort: 15672
  selector:
    app: rabbitmq
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}    

$ kubectl描述pod rabbitmq-0

Name:           rabbitmq-0
Namespace:      development
Node:           node9/170.XX.X.Xx
Labels:         app=rabbitmq
                controller-revision-hash=rabbitmq-4234207235
Status:         Running
IP:             10.25.128.XX
Controlled By:  StatefulSet/rabbitmq
Containers:
  rabbitmq:
    Container ID:   docker://f60b06283d3974382a068ded54782b24de4b6da3203c05772a77c65d76aa2e2f
    Image:          rabbitmq:1.0
    Image ID:       rabbitmq@sha256:6245a81a1fc0fb
    Ports:          5672/TCP, 5671/TCP, 15672/TCP, 25672/TCP, 4369/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
    Ready:          False
    Restart Count:  104
    Limits:
      cpu:     400m
      memory:  2Gi
    Requests:
      cpu:     200m
      memory:  1Gi
    Environment:
      RABBITMQ_ERLANG_COOKIE:  <set to the key 'rabbitmq-erlang-cookie' in secret 'rabbitmq-erlang-cookie'>  Optional: false
    Mounts:
      /etc/rabbitmq from rabbitmq-config (rw)
      /var/lib/rabbitmq from rabbitmq-persistent-data-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lqbp6 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  rabbitmq-persistent-data-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rabbitmq-persistent-data-storage-rabbitmq-0
    ReadOnly:   false
  rabbitmq-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-config
    Optional:    false
  default-token-lqbp6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lqbp6
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     <none>
Events:          <none>

1 个答案:

答案 0 :(得分:0)

此问题是由于Pod内部发生DNS解析失败所致。由于没有有效的DNS记录,这些Pod无法彼此联系。

为解决此问题,请尝试创建其他服务,或编辑现有服务以处理DNS解析。

为DNS探针创建附加服务可以按照以下步骤进行:

kind: Service apiVersion: v1 metadata: namespace: default name: rabbitmq labels: app: rabbitmq type: Service spec: ports: - name: http protocol: TCP port: 15672 targetPort: 15672 - name: amqp protocol: TCP port: 5672 targetPort: 5672 selector: app: rabbitmq type: ClusterIP clusterIP: None

在服务规范中,您提到了它是ClusterIP类型,而clusterIP没有。这应该可以帮助Pod解析DNS。

干杯!

里沙卜