也许我误解了Astyanax Cassandra API中的自动节点发现是如何工作的, 但这是我的问题:
我有以下设置:
2个数据中心,每个节点有2个节点,复制因子为2。
DC1:N1和N2和 DC2:N3和N4
种子节点是N1和N3(也提供给应用程序)。 其他节点(N2和N4)的自动发现似乎有效。即使它们没有显示在主机池中。
如果N3失败,则数据被正确写入N4,并且当节点再次出现时,它也正确地同步到N3。 N1和N2也是如此。
当两个种子节点(N1和N3)都出现故障时,就会出现问题。然后数据不再写入N2和N4(如预期的那样),但是Exception导致应用程序失败(当一个种子节点关闭时,Astyanax会向日志写入异常信息,但这通常不会导致申请失败)。
很明显,种子节点必须在应用程序启动时处于联机状态,但我认为astyanax中的自动节点发现会允许种子节点失败,以便复制节点可以接管(使用一致性级别) CL_ONE)。
有没有办法避免这种失败,或者我只是误解了自动节点发现, 或者我只是做了一件非常糟糕的事情?
其他一些信息: 节点主要使用cassandra.yaml中的默认设置,并使用python脚本生成标记, 在文件中提出。
private AstyanaxContext<Cluster> connect(final String hosts) {
AstyanaxConfigurationImpl asConfig = new AstyanaxConfigurationImpl();
asConfig.setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ONE);
asConfig.setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ONE);
AstyanaxContext<Cluster> context = new AstyanaxContext.Builder()
.forCluster("TestSuitCluster")
.withAstyanaxConfiguration(
asConfig.setDiscoveryType(NodeDiscoveryType.TOKEN_AWARE)
.setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE))
.withConnectionPoolConfiguration(
new ConnectionPoolConfigurationImpl(
"CassandraConnectionPool").setSeeds(hosts)
.setMaxConnsPerHost(8).setMaxConns(8))
.withConnectionPoolMonitor(new ConnectionPoolMonitor())
.buildCluster(ThriftFamilyFactory.getInstance());
context.start();
return context;
}
当最后一个种子节点消失时显示的堆栈跟踪:
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=127.0.0.1(127.0.0.1):9160, latency=2000(2000), attempts=1]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:218)
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:185)
at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:66)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:67)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
at com.netflix.astyanax.thrift.ThriftClusterImpl.describeKeyspaces(ThriftClusterImpl.java:165)
at com.netflix.astyanax.thrift.ThriftClusterImpl.describeKeyspace(ThriftClusterImpl.java:184)
at at.dbeg.cassandra.CasandraTestSuit.deleteKeyspace(CasandraTestSuit.java:134)
at at.dbeg.cassandra.CasandraTestSuit.runTests(CasandraTestSuit.java:189)
at at.dbeg.cassandra.CasandraTestSuit.main(CasandraTestSuit.java:50)
com.netflix.astyanax.connectionpool.exceptions.ConnectionAbortedException: ConnectionAbortedException: [host=127.0.0.1(127.0.0.1):9160, latency=0(0), attempts=1]org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset by peer: socket write error
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:193)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$6$3.execute(ThriftKeyspaceImpl.java:355)
at at.dbeg.cassandra.CasandraTestSuit.testWrite(CasandraTestSuit.java:269)
at at.dbeg.cassandra.CasandraTestSuit.runTests(CasandraTestSuit.java:168)
at at.dbeg.cassandra.CasandraTestSuit.main(CasandraTestSuit.java:50)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset by peer: socket write error
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at org.apache.cassandra.thrift.Cassandra$Client.send_insert(Cassandra.java:833)
at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:822)
at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$6$3$1.internalExecute(ThriftKeyspaceImpl.java:367)
at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$6$3$1.internalExecute(ThriftKeyspaceImpl.java:358)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
... 10 more
Caused by: java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
... 17 more
答案 0 :(得分:0)
我想我终于找到了答案。在没有自己的HostSupplier的群集上下文中,这是不可能的。解决此问题的最简单方法是迭代集群中的所有键空间,并使用RingDescribeHostSupplier的逻辑查找所有主机。
如果在AstyanaxContext中使用并设置了此HostSupplier,则会显示预期的行为。