如何在所有节点再次启动后调试为什么提示不会得到处理

时间:2017-02-04 18:32:32

标签: cassandra cassandra-2.1

今天在节点d1r1n3上对14x节点dsc 2.1.15群集进行了一些扩展维护,但在群集的最大提示窗口内完成了。

将节点恢复到大多数其他节点之后'除了两个节点(d1r1n4和d1r1n7)之外,提示在几分钟内再次消失,其中只有部分提示消失了。

在显示1个活动的hintedhandoff任务几个小时后,我重新启动了节点d1r1n7,然后很快d1r1n4清空了它的提示表。

如何查看d1r1n7上存储的提示节点的目的地? 并且可能如何处理提示?

Cluster TP statistics

更新: 在使节点d1r1n3离线以便维护d1r1n7'之后找到对应于maxhint-end结尾的窗口。暗示消失了。让我们对这是否合适感到困惑。如果暗示处理好了,或者在maxhint窗口结束后有些过期了吗? 如果后者需要在节点d1r1n3之后运行修复(这需要相当长的时间和IO ...:/)如果我们现在应用read [LOCAL] QUORUM而不是当前读取的那个怎么办?有一个DC和RF = 3,这可能会在需要的基础上触发读取路径修复并且可能需要我们进行全面维修吗?

Cluster TOP stats after end of maxhint window

答案:结果是这两个节点上的hinted_handoff_throttle_in_kb是@ default 1024,而其余的群集是@ 65536:)

1 个答案:

答案 0 :(得分:1)

提示存储在system.hints表中的cassandra 2.1.15中

cqlsh> describe table system.hints;
CREATE TABLE system.hints (
    target_id uuid,
    hint_id timeuuid,
    message_version int,
    mutation blob,
    PRIMARY KEY (target_id, hint_id, message_version)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (hint_id ASC, message_version ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = 'hints awaiting delivery'
    AND compaction = {'enabled': 'false', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 3600000
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

target_id 与节点ID

相关联

例如

在我的样本2节点集群中,RF = 2

cqlsh> describe table system.hints;
CREATE TABLE system.hints (
    target_id uuid,
    hint_id timeuuid,
    message_version int,
    mutation blob,
    PRIMARY KEY (target_id, hint_id, message_version)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (hint_id ASC, message_version ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = 'hints awaiting delivery'
    AND compaction = {'enabled': 'false', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 3600000
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

我在node2关闭时执行了以下操作

nodetool status
Datacenter: datacenter1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.1  71.47 KB   256     100.0%            d00c4b10-2997-4411-9fc9-f6d9f6077916  rack1
DN  127.0.0.2  75.4 KB    256     100.0%            1ca6779d-fb41-4a26-8fa8-89c6b51d0bfa  rack1

可以看出 system.hints.target_id 与nodetool状态中的主机ID 相关联(1ca6779d-fb41-4a26-8fa8-89c6b51d0bfa)