Question

我有一个包含四个节点的集群，每个节点都有70G数据。当我向集群添加新节点时，它总是如此警告我这样的 tombstones 问题：

WARN  09:38:03 Read 2578 live and 1114 tombstoned cells in xxxtable (see tombstone_warn_threshold).
10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
localDeletion=2147483647, ranges=[FAE69193423616A400258D99B9C0CCCFEC4A9547C1A1FC17BF569D2405705B8E:_-FAE69193423616A400258D99B9C0CCCFEC4A9547C1A1FC17BF569D2405705B8E:!,
deletedAt=1456243983944000, localDeletion=1456243983][FAE69193423616A40EC252766DDF513FBCA55ECDFAF452052E6C95D4BD641201:_-FAE69193423616A40EC252766DDF513FBCA55ECDFAF452052E6C95D4BD641201:!,
deletedAt=1460026357100000, localDeletion=1460026357][FAE69193423616A41BED8E613CD24BF3583FB6C6ABBA13F19C3E2D1824D01EF6:_-FAE69193423616A41BED8E613CD24BF3583FB6C6ABBA13F19C3E2D1824D01EF6:!, deletedAt=1458176745950000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3B06C1306E35B0ACA719D800D254E5930:_-FAE69193423616A41BED8E613CD24BF3B06C1306E35B0ACA719D800D254E5930:!, deletedAt=1458176745556000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3BA2AE7FC8340F96CC440BDDFFBCBE7D0:_-FAE69193423616A41BED8E613CD24BF3BA2AE7FC8340F96CC440BDDFFBCBE7D0:!,
deletedAt=1458176745740000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3E5A681C7ECC09A93429CEE59A76DA131:_-FAE69193423616A41BED8E613CD24BF3E5A681C7ECC09A93429CEE59A76DA131:!,
deletedAt=1458792793219000, localDeletion=

最后需要很长时间才能开始并抛出 java.lang.OutOfMemoryError: Java heap space

以下是错误日志：

INFO  20:39:20 ConcurrentMarkSweep GC in 5859ms.  CMS Old Gen: 6491794984 -> 6492437040; Par Eden Space: 1398145024 -> 1397906216; Par Survivor Space: 349072992 -> 336156096
INFO  20:39:20 Enqueuing flush of refresh_token: 693 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
INFO  20:39:20 Enqueuing flush of log_user_track: 7047 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 CounterMutationStage              0         0              0         0                 0
INFO  20:39:20 Enqueuing flush of userinbox: 42819 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 Enqueuing flush of messages: 7954 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 ReadStage                         0         0              0         0                 0
INFO  20:39:20 RequestResponseStage              0         0              6         0                 0
INFO  20:39:20 Enqueuing flush of sstable_activity: 6567 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 ReadRepairStage                   0         0              0         0                 0
INFO  20:39:20 Enqueuing flush of convmsgs: 2132 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 MutationStage                     0         0          72300         0                 0
INFO  20:39:20 Enqueuing flush of sstable_activity: 1791 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 GossipStage                       0         0          23655         0                 0
INFO  20:39:20 Enqueuing flush of log_user_track: 1165 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 AntiEntropyStage                  0         0              0         0                 0
INFO  20:39:20 Enqueuing flush of sstable_activity: 2388 (0%) on-heap, 0 (0%) off-heap
INFO  20:39:20 CacheCleanupExecutor              0         0              0         0                 0
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid17155.hprof ...

当我运行nodetool tpstats时，我发现MemtableFlushWriter和MemtablePostFlush的任务很多都是待处理。

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
CounterMutationStage              0         0              0         0                 0
ReadStage                         0         0              0         0                 0
RequestResponseStage              0         0              8         0                 0
MutationStage                     0         0        1382245         0                 0
ReadRepairStage                   0         0              0         0                 0
GossipStage                       0         0          23553         0                 0
CacheCleanupExecutor              0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0              0         0                 0
ValidationExecutor                0         0              0         0                 0
CommitLogArchiver                 0         0              0         0                 0
MiscStage                         0         0              0         0                 0
MemtableFlushWriter               4      7459            220         0                 0
MemtableReclaimMemory             0         0            231         0                 0
PendingRangeCalculator            0         0              3         0                 0
MemtablePostFlush                 1      7464            331         0                 0
CompactionExecutor                3         3            269         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              4         0                 0

无法向cassandra集群添加新节点

0 个答案: