部署在同一kubernetes命名空间中的Apache Ignite节点不加入同一集群

时间:2019-07-20 22:31:42

标签: kubernetes ignite

部署为Pod的Apache点燃节点使用TcpDiscoveryKubernetesIpFinder相互发现,但无法通信,因此不会加入同一集群。

我使用“官方”教程在Azure上为基于ignite的应用程序设置了kubernetes部署。至此,部署已成功完成,但每个吊舱的拓扑中始终只有一台服务器。当我直接登录Pod并尝试连接到Pod 47500上的另一个Pod时,它不起作用。更有趣的是,端口47500仅在当前容器上的127.0.01上访问而不使用其外部IP。

以下是pod /节点1上的调试消息。如您所见,TcpDiscoveryKubernetesIpFinder发现了两个点燃的pod /节点。但是它无法连接到另一个点火节点:

INFO  [org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi] (ServerService Thread Pool -- 5) Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
DEBUG [org.apache.ignite.internal.managers.communication.GridIoManager] (ServerService Thread Pool -- 5) Starting SPI: TcpCommunicationSpi [connectGate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@48ca2359, enableForcibleNodeKill=false, enableTroubleshootingLog=false, locAddr=null, locHost=0.0.0.0/0.0.0.0, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=GridNioServer [selectorSpins=0, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@30a29315, directMode=true], GridConnectionBytesVerifyFilter], closed=false, directBuf=true, tcpNoDelay=true, sockSndBuf=32768, sockRcvBuf=32768, writeTimeout=2000, idleTimeout=600000, skipWrite=false, skipRead=false, locAddr=0.0.0.0/0.0.0.0:47100, order=LITTLE_ENDIAN, sndQueueLimit=0, directMode=true, sslFilter=null, msgQueueLsnr=null, readerMoveCnt=0, writerMoveCnt=0, readWriteSelectorsAssign=false], shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=47100, boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@4186e275[Count = 1], stopping=false]
DEBUG [org.apache.ignite.internal.managers.communication.GridIoManager] (ServerService Thread Pool -- 5) Starting SPI implementation: org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi
DEBUG [org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi] (ServerService Thread Pool -- 5) Using parameter [locAddr=null]
DEBUG [org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi] (ServerService Thread Pool -- 5) Using parameter [locPort=47100]
DEBUG [org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi]  Grid runnable started: tcp-disco-srvr
DEBUG [org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder] (ServerService Thread Pool -- 5) Getting Apache Ignite endpoints from: https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/default/endpoints/ignite
DEBUG [org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder] (ServerService Thread Pool -- 5) Added an address to the list: 10.244.0.93
DEBUG [org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder] (ServerService Thread Pool -- 5) Added an address to the list: 10.244.0.94
ERROR [org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi] (ServerService Thread Pool -- 5) Exception on direct send: Invalid argument (connect failed): java.net.ConnectException: Invalid argument (connect failed)
    at java.net.PlainSocketImpl.socketConnect(Native Method)

我直接登录了Pod,并尝试在其他节点/ pod上执行ping操作,但echo > /dev/tcp/10.244.0.93/47500echo > /dev/tcp/10.244.0.94/47500均无效。 另一方面,echo > /dev/tcp/127.0.0.1/47500会这样做。这使我认为ignite只是在侦听本地环回地址。

pod /节点2上有类似的日志

这是kubernetes配置

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pgdata
  namespace: default
  annotations:
    volume.alpha.kubernetes.io/storage-class: default
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ignite
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: ignite
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - endpoints
  verbs:
  - get
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: ignite
roleRef:
  kind: ClusterRole
  name: ignite
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: ignite
  namespace: default
---
apiVersion: v1
kind: Service
metadata:
  name: ignite
  namespace: default
spec:
  clusterIP: None # custom value.
  ports:
    - port: 9042 # custom value.
  selector:
    type: processing-engine-node
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: database-tenant-1
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: database-tenant-1
  template:
    metadata:
      labels:
        app: database-tenant-1
    spec:
      containers:
      - name: database-tenant-1
        image: postgres:12
        env:
        - name: "POSTGRES_USER"
          value: "admin"
        - name: "POSTGRES_PASSWORD"
          value: "admin"
        - name: "POSTGRES_DB"
          value: "tenant1"
        volumeMounts:
        - name: pgdata
          mountPath: /var/lib/postgresql/data
          subPath: postgres
        ports:
        - containerPort: 5432
        readinessProbe:
          exec:
            command: ["psql", "-W", "admin", "-U", "admin", "-d", "tenant1", "-c", "SELECT 1"]
          initialDelaySeconds: 15
          timeoutSeconds: 2
        livenessProbe:
          exec:
            command: ["psql", "-W", "admin", "-U", "admin", "-d", "tenant1", "-c", "SELECT 1"]
          initialDelaySeconds: 45
          timeoutSeconds: 2
      volumes:
        - name: pgdata
          persistentVolumeClaim:
            claimName: pgdata
---
apiVersion: v1
kind: Service
metadata:
  name: database-tenant-1
  namespace: default
  labels:
    app: database-tenant-1
spec:
  type: NodePort
  ports:
   - port: 5432
  selector:
   app: database-tenant-1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: processing-engine-master
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: processing-engine-master
  template:
    metadata:
      labels:
        app: processing-engine-master
        type: processing-engine-node
    spec:
      serviceAccountName: ignite
      initContainers:
      - name: check-db-ready
        image: postgres:12
        command: ['sh', '-c', 
          'until pg_isready -h database-tenant-1 -p 5432; 
          do echo waiting for database; sleep 2; done;']
      containers:
      - name: xxxx-engine-master
        image: shostettlerprivateregistry.azurecr.io/xxx/xxx-application:4.2.5
        ports:
            - containerPort: 8081
            - containerPort: 11211 # REST port number.
            - containerPort: 47100 # communication SPI port number.
            - containerPort: 47500 # discovery SPI port number.
            - containerPort: 49112 # JMX port number.
            - containerPort: 10800 # SQL port number.
            - containerPort: 10900 # Thin clients port number.
        volumeMounts:
        - name: config-volume
          mountPath: /opt/project-postgres.yml
          subPath: project-postgres.yml
      volumes:
          - name: config-volume
            configMap:
              name: pe-config
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: processing-engine-worker
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: processing-engine-worker
  template:
    metadata:
      labels:
        app: processing-engine-worker
        type: processing-engine-node
    spec:
      serviceAccountName: ignite
      initContainers:
      - name: check-db-ready
        image: postgres:12
        command: ['sh', '-c', 
          'until pg_isready -h database-tenant-1 -p 5432; 
          do echo waiting for database; sleep 2; done;']
      containers:
      - name: xxx-engine-worker
        image: shostettlerprivateregistry.azurecr.io/xxx/xxx-worker:4.2.5
        ports:
            - containerPort: 8081
            - containerPort: 11211 # REST port number.
            - containerPort: 47100 # communication SPI port number.
            - containerPort: 47500 # discovery SPI port number.
            - containerPort: 49112 # JMX port number.
            - containerPort: 10800 # SQL port number.
            - containerPort: 10900 # Thin clients port number.

        volumeMounts:
        - name: config-volume
          mountPath: /opt/project-postgres.yml
          subPath: project-postgres.yml
      volumes:
          - name: config-volume
            configMap:
              name: pe-config

和点火配置

<bean id="tcpDiscoveryKubernetesIpFinder" class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder"/>

<property name="discoverySpi">
    <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
        <property name="localPort" value="47500" />
        <property name="localAddress" value="127.0.0.1" />
        <property name="networkTimeout" value="10000" />
        <property name="ipFinder">
            <bean id="tcpDiscoveryKubernetesIpFinder" class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder"/>
        </property>
    </bean>
</property>

我希望Pod能够进行通信,并最终获得以下拓扑拓扑快照:

[ver=1, locNode=a8e6a058, servers=2, clients=0, state=ACTIVE, CPUs=2, offheap=0.24GB, heap=1.5GB]

1 个答案:

答案 0 :(得分:1)

您将发现配置为绑定到本地主机:

<property name="localAddress" value="127.0.0.1" />

这意味着来自不同容器的节点将无法相互连接。尝试从配置中删除此行。