处理Astyanax Cassandra API中的失败种子节点

时间:2014-08-19 10:53:26

标签: java cassandra astyanax

也许我误解了Astyanax Cassandra API中的自动节点发现是如何工作的, 但这是我的问题:

我有以下设置:

2个数据中心,每个节点有2个节点,复制因子为2。

DC1:N1和N2和 DC2:N3和N4

种子节点是N1和N3(也提供给应用程序)。 其他节点(N2和N4)的自动发现似乎有效。即使它们没有显示在主机池中。

如果N3失败,则数据被正确写入N4,并且当节点再次出现时,它也正确地同步到N3。 N1和N2也是如此。

当两个种子节点(N1和N3)都出现故障时,就会出现问题。然后数据不再写入N2和N4(如预期的那样),但是Exception导致应用程序失败(当一个种子节点关闭时,Astyanax会向日志写入异常信息,但这通常不会导致申请失败)。

很明显,种子节点必须在应用程序启动时处于联机状态,但我认为astyanax中的自动节点发现会允许种子节点失败,以便复制节点可以接管(使用一致性级别) CL_ONE)。

有没有办法避免这种失败,或者我只是误解了自动节点发现, 或者我只是做了一件非常糟糕的事情?

其他一些信息: 节点主要使用cassandra.yaml中的默认设置,并使用python脚本生成标记, 在文件中提出。

private AstyanaxContext<Cluster> connect(final String hosts) {
    AstyanaxConfigurationImpl asConfig = new AstyanaxConfigurationImpl();
    asConfig.setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ONE);
    asConfig.setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ONE);
    AstyanaxContext<Cluster> context = new AstyanaxContext.Builder()
            .forCluster("TestSuitCluster")
            .withAstyanaxConfiguration(
                    asConfig.setDiscoveryType(NodeDiscoveryType.TOKEN_AWARE)
                    .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE))
            .withConnectionPoolConfiguration(
                    new ConnectionPoolConfigurationImpl(
                            "CassandraConnectionPool").setSeeds(hosts)
                            .setMaxConnsPerHost(8).setMaxConns(8))
            .withConnectionPoolMonitor(new ConnectionPoolMonitor())
            .buildCluster(ThriftFamilyFactory.getInstance());
    context.start();
    return context;
}

当最后一个种子节点消失时显示的堆栈跟踪:

com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=127.0.0.1(127.0.0.1):9160, latency=2000(2000), attempts=1]Timed out waiting for connection
    at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:218)
    at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:185)
    at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:66)
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:67)
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.describeKeyspaces(ThriftClusterImpl.java:165)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.describeKeyspace(ThriftClusterImpl.java:184)
    at at.dbeg.cassandra.CasandraTestSuit.deleteKeyspace(CasandraTestSuit.java:134)
    at at.dbeg.cassandra.CasandraTestSuit.runTests(CasandraTestSuit.java:189)
    at at.dbeg.cassandra.CasandraTestSuit.main(CasandraTestSuit.java:50)    
com.netflix.astyanax.connectionpool.exceptions.ConnectionAbortedException: ConnectionAbortedException: [host=127.0.0.1(127.0.0.1):9160, latency=0(0), attempts=1]org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset by peer: socket write error
    at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:193)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
    at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
    at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
    at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
    at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$6$3.execute(ThriftKeyspaceImpl.java:355)
    at at.dbeg.cassandra.CasandraTestSuit.testWrite(CasandraTestSuit.java:269)
    at at.dbeg.cassandra.CasandraTestSuit.runTests(CasandraTestSuit.java:168)
    at at.dbeg.cassandra.CasandraTestSuit.main(CasandraTestSuit.java:50)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset by peer: socket write error
    at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
    at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
    at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
    at org.apache.cassandra.thrift.Cassandra$Client.send_insert(Cassandra.java:833)
    at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:822)
    at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$6$3$1.internalExecute(ThriftKeyspaceImpl.java:367)
    at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$6$3$1.internalExecute(ThriftKeyspaceImpl.java:358)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
    ... 10 more
Caused by: java.net.SocketException: Connection reset by peer: socket write error
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
    at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
    ... 17 more 

1 个答案:

答案 0 :(得分:0)

我想我终于找到了答案。在没有自己的HostSupplier的群集上下文中,这是不可能的。解决此问题的最简单方法是迭代集群中的所有键空间,并使用RingDescribeHostSupplier的逻辑查找所有主机。

如果在AstyanaxContext中使用并设置了此HostSupplier,则会显示预期的行为。