我有一个包含四个节点的集群,每个节点都有70G数据。 当我向集群添加新节点时,它总是如此 警告我这样的 tombstones 问题:
WARN 09:38:03 Read 2578 live and 1114 tombstoned cells in xxxtable (see tombstone_warn_threshold).
10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
localDeletion=2147483647, ranges=[FAE69193423616A400258D99B9C0CCCFEC4A9547C1A1FC17BF569D2405705B8E:_-FAE69193423616A400258D99B9C0CCCFEC4A9547C1A1FC17BF569D2405705B8E:!,
deletedAt=1456243983944000, localDeletion=1456243983][FAE69193423616A40EC252766DDF513FBCA55ECDFAF452052E6C95D4BD641201:_-FAE69193423616A40EC252766DDF513FBCA55ECDFAF452052E6C95D4BD641201:!,
deletedAt=1460026357100000, localDeletion=1460026357][FAE69193423616A41BED8E613CD24BF3583FB6C6ABBA13F19C3E2D1824D01EF6:_-FAE69193423616A41BED8E613CD24BF3583FB6C6ABBA13F19C3E2D1824D01EF6:!, deletedAt=1458176745950000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3B06C1306E35B0ACA719D800D254E5930:_-FAE69193423616A41BED8E613CD24BF3B06C1306E35B0ACA719D800D254E5930:!, deletedAt=1458176745556000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3BA2AE7FC8340F96CC440BDDFFBCBE7D0:_-FAE69193423616A41BED8E613CD24BF3BA2AE7FC8340F96CC440BDDFFBCBE7D0:!,
deletedAt=1458176745740000, localDeletion=1458176745][FAE69193423616A41BED8E613CD24BF3E5A681C7ECC09A93429CEE59A76DA131:_-FAE69193423616A41BED8E613CD24BF3E5A681C7ECC09A93429CEE59A76DA131:!,
deletedAt=1458792793219000, localDeletion=
最后需要很长时间才能开始并抛出
java.lang.OutOfMemoryError: Java heap space
以下是错误日志:
INFO 20:39:20 ConcurrentMarkSweep GC in 5859ms. CMS Old Gen: 6491794984 -> 6492437040; Par Eden Space: 1398145024 -> 1397906216; Par Survivor Space: 349072992 -> 336156096 INFO 20:39:20 Enqueuing flush of refresh_token: 693 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 Pool Name Active Pending Completed Blocked All Time Blocked INFO 20:39:20 Enqueuing flush of log_user_track: 7047 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 CounterMutationStage 0 0 0 0 0 INFO 20:39:20 Enqueuing flush of userinbox: 42819 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 Enqueuing flush of messages: 7954 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 ReadStage 0 0 0 0 0 INFO 20:39:20 RequestResponseStage 0 0 6 0 0 INFO 20:39:20 Enqueuing flush of sstable_activity: 6567 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 ReadRepairStage 0 0 0 0 0 INFO 20:39:20 Enqueuing flush of convmsgs: 2132 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 MutationStage 0 0 72300 0 0 INFO 20:39:20 Enqueuing flush of sstable_activity: 1791 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 GossipStage 0 0 23655 0 0 INFO 20:39:20 Enqueuing flush of log_user_track: 1165 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 AntiEntropyStage 0 0 0 0 0 INFO 20:39:20 Enqueuing flush of sstable_activity: 2388 (0%) on-heap, 0 (0%) off-heap INFO 20:39:20 CacheCleanupExecutor 0 0 0 0 0 java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid17155.hprof ...
当我运行nodetool tpstats
时,我发现MemtableFlushWriter和MemtablePostFlush的任务很多都是待处理。
Pool Name Active Pending Completed Blocked All time blocked CounterMutationStage 0 0 0 0 0 ReadStage 0 0 0 0 0 RequestResponseStage 0 0 8 0 0 MutationStage 0 0 1382245 0 0 ReadRepairStage 0 0 0 0 0 GossipStage 0 0 23553 0 0 CacheCleanupExecutor 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage 0 0 0 0 0 ValidationExecutor 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 MiscStage 0 0 0 0 0 MemtableFlushWriter 4 7459 220 0 0 MemtableReclaimMemory 0 0 231 0 0 PendingRangeCalculator 0 0 3 0 0 MemtablePostFlush 1 7464 331 0 0 CompactionExecutor 3 3 269 0 0 InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 4 0 0