cassandra - 同一数据中心中的Cassandra节点给出不同的查询结果/错误

我遇到了一个具有多个数据中心的cassandra集群的问题，每个数据中心有3个节点，每个数据中心有2个节点充当种子：

我有一个带有ReplicationFactor 3的键空间X，它在数据中心DC1中具有3个副本，在数据中心DC2中具有3个副本（KEYSPACE X WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;）

现在，我要做的事情（也许我在这里缺少什么）是我对数据中心DC2中的每个节点（例如，node2A，node2B和node2C）进行cqlsh并执行以下操作：

cqlsh node2N
所有一致性
从x.table中选择*；

通过将一致性设置为ALL，我知道我必须从每个节点（一个属于DC1的三个节点和三个属于DC2的节点）获得一个响应，总共有6个响应。但是相反，我在每个节点中得到3个不同的结果：

node2A：查询失败，并显示Cannot achieve consistency level ALL info: {'required_replicas': 6, 'alive_replicas': 5, 'consistency': ALL}
node2B：查询成功并返回表数据
node2C：查询需要1-2分钟，然后返回Coordinator node timed out waiting for replica nodes' responses. Operation timed out - received only 5 responses. info: {'received_responses': 5, 'required_responses': 6, 'consistency': ALL}

之所以我在cqlsh中执行这些查询，是因为我们的应用程序之一在查询cassandra时表现异常（说诸如QUORUM的副本不足等），我怀疑我们的通讯可能有问题节点之间。闲聊要么是将不同的事情告诉不同的节点，要么是类似的事情。通信从每个节点到任何其他节点（我们可以使用cqlsh，ssh和其他所有功能）进行。

我的理论是否正确，我们在配置方面存在某种不一致？如果是这样，我该如何调试这些故障？有没有一种方法可以知道哪个节点不活跃或没有响应，以便我可以更仔细地查看其通信？我尝试使用“ tracing on”，但它仅对成功查询有效，因此我仅在node2B中获得跟踪（顺便说一句，同一节点上的行为并不总是相同，这似乎是随机的）

如果没有，我的cqlsh测试是否有效？还是我在这里错过了卡桑德拉难题的重要部分？

在此先多谢，我要在这里生气...

编辑：根据要求，这是nodetool describecluster的输出。我在DC2的所有3个节点中都这样做了：

node2A：

Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)]

node2B：

Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] UNREACHABLE: [couple of IPs from other datacenter/keyspaces]

node2C：

Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] UNREACHABLE: [node2A and other IPs]

值得一提的是，在node2A中没有node2C，在node2B中所有3个节点都出现了，而在node2C中，node2A为UNREACHABLE ...

我觉得这很不对劲，

我刚刚执行了一个“ nodetool status keyspaceX”，结果如下：

node2A：

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 ?N node2C 67,11 MB 256 100,0% - RAC1

node2B：

node2C：

现在，为什么node2A不知道node2C的状态（显示为？，并且它没有出现在describecluster的SchemaVersion中）？但是，为什么在describecluster中从node2A抱怨为UNREACHABLE的node2C确实根据状态知道了node2A处于运行状态？

同一数据中心中的Cassandra节点给出不同的查询结果/错误

2 个答案: