kubernetes中的Janusgraph无法连接到作为另一服务运行的Cassandra

时间:2019-03-29 07:06:32

标签: kubernetes kubernetes-helm connection-timeout janusgraph

我正在尝试使用存储作为Cassandra来运行Janusgraph,在同一集群中作为另一项服务运行,并在Elasticsearch中进行索引,然后在同一集群中又作为另一项服务运行。

虽然在这两个服务中都需要打开端口,但是janusgraph pods的日志显示在连接到Cassandra时其面临的连接超时。

23343 [main] WARN  org.apache.tinkerpop.gremlin.server.GremlinServer  - Graph [graph] configured at [conf/gremlin-server/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server.  GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:70)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:104)
    at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.lambda$new$0(DefaultGraphManager.java:57)
    at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
    at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:55)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:110)
    at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:354)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:78)
    ... 13 more
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager
    at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:69)
    at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
    at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:409)
    at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1376)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:164)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:133)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:113)
    ... 18 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
    ... 24 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:619)
    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.<init>(AstyanaxStoreManager.java:314)
    ... 29 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=cassandra(SERVICE_IP):9160, latency=10001(10001), attempts=1]Timed out waiting for connection
    at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
    at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
    at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateKeyspace(ThriftClusterImpl.java:321)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:294)
    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:614)

我正在为cassandra运行janusgrah v2图像和gcr.io/google-samples/cassandra:v13图像。

我也尝试从busybox pod连接到cassandra端口9160。但似乎不起作用。 但是有趣的是:ping似乎适用于服务名称(此处为Cassandra)。但是只有当它到达端口9160或9042上的telnet时,我才会收到连接被拒绝的错误。

这是cassandra STS:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
spec:
  clusterIP: None
  ports:
  - port: 9042
    name: cql
  - port: 9160
    name: thrift
  selector:
    app: cassandra
---    
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      #schedulerName: stork       #Check benefits of using STORK as scheduler.
      containers:
      - name: cassandra
        image: gcr.io/google-samples/cassandra:v13
        ports:
          - containerPort: 7000
            name: intra-node
          - containerPort: 7001
            name: tls-intra-node
          - containerPort: 7199
            name: jmx
          - containerPort: 9042
            name: cql
          - containerPort: 9160
            name: thrift
          - containerPort: 9142
            name: transportssl
        resources:
          limits:
            cpu: "1Gi"
            memory: 2Gi
          requests:
            cpu: "500m"
            memory: 1Gi
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        lifecycle:
          preStop:
            exec:
              command: 
              - /bin/sh
              - -c
              - nodetool drain
        env:
          - name: CASSANDRA_SEEDS
            value: cassandra-0.cassandra.default.svc.cluster.local
          - name: MAX_HEAP_SIZE 
            value: 512M
          - name: HEAP_NEWSIZE
            value: 512M
          - name: CASSANDRA_CLUSTER_NAME
            value: "Cassandra"
          - name: CASSANDRA_DC
            value: "DC1"
          - name: CASSANDRA_RACK
            value: "Rack1"
          - name: CASSANDRA_AUTO_BOOTSTRAP
            value: "false"            
          - name: CASSANDRA_ENDPOINT_SNITCH
            value: GossipingPropertyFileSnitch
          - name: CASSANDRA_RPC_ADDRESS
            value: 0.0.0.0
          - name: CASSANDRA_NUM_TOKENS
            value: "32"
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /ready-probe.sh
          initialDelaySeconds: 15
          timeoutSeconds: 5
        volumeMounts:
        - name: nfs-pvc-cassandra
          mountPath: /srv/nfs/kubedata/janus
      restartPolicy: Always
      volumes:
        - name: nfs-pvc-cassandra
          persistentVolumeClaim:
            claimName: nfs-pvc-cassandra

我可以进一步调试它的方式是什么?

3 个答案:

答案 0 :(得分:0)

如果janusgraph在主机上运行,​​则可能必须对kubernetes服务端口进行端口转发才能在本地访问它。也许您已经做到了

答案 1 :(得分:0)

正如我可以确认的那样,您的StatefulSet yaml可以正常工作,并且无头服务会创建指向Pod端点的dns名称。我创建了简单的nginx pod到telnet以进行检查。输出如下:

检查cassandra是否存在端点和服务

$ kubectl get svc,ep cassandra
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/cassandra   ClusterIP   None         <none>        9042/TCP,9160/TCP   2h

NAME                  ENDPOINTS                                                   AGE
endpoints/cassandra   10.56.0.10:9160,10.56.3.3:9160,10.56.4.2:9160 + 3 more...   2h

在同一个命名空间中执行到邻居容器,并在telnet中执行服务和容器

$ kubectl  exec -it nginx-79dbd67896-9dwp8 bash

root@nginx-79dbd67896-9dwp8:/# telnet cassandra 9042
Trying 10.56.3.3...
Connected to cassandra.default.svc.cluster.local.
Escape character is '^]'.

telnet> quit
Connection closed.

root@nginx-79dbd67896-9dwp8:/# telnet 10.56.0.10 9042
Trying 10.56.0.10...
Connected to 10.56.0.10.
Escape character is '^]'.

从输出服务看来,pod正在侦听端口9042,而不是9160,因为端口9160用于Cassandra的Thrift API服务器,该服务器默认情况下处于禁用状态。有关此问题的更多信息,请检查https://github.com/docker-library/cassandra/issues/127。您必须检查如何启用Thrift API端口。

您可以通过执行以下命令之一来检查cassandra容器上的侦听端口:

root@cassandra-0:/# apt update && apt install telnet net-tools
<output omitted>

root@cassandra-0:/# netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp        0      0 0.0.0.0:9042            0.0.0.0:*               LISTEN      1000       10163244   -                   
tcp        0      0 127.0.0.1:43669         0.0.0.0:*               LISTEN      1000       10162974   -                   
tcp        0      0 10.56.0.10:7000         0.0.0.0:*               LISTEN      1000       10163145   -                   
tcp        0      0 127.0.0.1:7199          0.0.0.0:*               LISTEN      1000       10162973   -                   

希望有帮助!

答案 2 :(得分:0)

只需更新janusgraph Storage.hostname: cassandra-0.cassandra.default中的values.yaml,以使janusgraph pod与cassandra通信即可。

使用nodetool命令在cassandra节点上检查节俭状态 nodetool statusThrift

如果未启用,则使用no​​detool命令(nodetool enablethrift)再次启用它