我们在不同的debian wheezy VM上运行带有3个节点的cassandra 2.1.5安装。现在我们遇到的问题是,经过几天的操作没有问题,突然插入一个表不再可能。然后我们看到以下错误消息:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (2 responses were required but only 1 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
当我在每个节点上执行nodetool状态时,它显示所有节点都处于Status = Up和State = Normal。
如果我在节点上运行nodetool修复,我会看到成千上万的异常,例如:
2015-06-03 16:40:58,023 ERROR [AntiEntropySessions:17] RepairSession.java:303 - [repair #858c8470-09fe-11e5-930b-d16ee278cb3a] session completed with the following error
java.io.IOException: Failed during snapshot creation.
at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:146) ~[apache-cassandra-2.1.5.jar:2.1.5]
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
在日志中。要修复cassandra,我需要在每个节点上重新启动cassandra守护进程,然后在每个节点上运行nodetool修复(在重新启动节点后工作而不抛出异常)。然后它再次工作2-3天,直到再次出现相同的问题。
这是一个已知问题或可能导致此类行为的原因?对我来说,当错误发生时,看起来节点不能再相互通信了,但如果是这种情况,为什么nodetool status
显示所有节点的UN(Up / Normal)?