应用错误收集

Cassandra nodetool修复究竟做了什么？

时间：2015-09-01 20:35:46

标签： cassandra

来自http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html我知道

nodetool repair命令修复给定数据范围的所有副本的不一致性。

但它如何解决不一致问题？它写的是使用Merkle树 - 但这不是用于修复'破碎'数据。
如何“破坏”数据？尽管硬盘驱动器出现故障仍然存在常见情况？

问题：它是压缩的，它驱逐了墓碑，对吗？因此，比gc_grace秒更频繁地运行nodetool修复的要求只是为了确保所有数据都传播到适当的副本？不应该是通常情况吗？

1 个答案:

答案 0 :(得分：3)

The data can become inconsistent whenever a write to a replica is not completed for whatever reason. This can happen if a node is down, if the node is up but the network connection is down, if a queue fills up and the write is dropped, disk failure, etc.

When inconsistent data is detected by comparing the merkle trees, the bad sections of data are repaired by streaming them from the nodes with the newer data. Streaming is a basic mechanism in Cassandra and is also used for bootstrapping empty nodes into the cluster.

The reason you need to run repair within gc grace seconds is so that tombstones will be sync'd to all nodes. If a node is missing a tombstone, then it won't drop that data during compaction. The nodes with the tombstone will drop the data during compaction, and then when they later run repair, the deleted data can be resurrected from the node that was missing the tombstone.