Question

我们生产的Cassandra集群无法停止像这样的WARN和ERRORS：

WARN [ReadStage:290753] 2016-04-22 17:00:06,461 SliceQueryFilter.java (line 231) Read 101 live and 33528 tombstone cells in keyspace.tablespace.Events_event_type_idx (see tombstone_warn_threshold). 100 columns was requested, slices=[5347432d45504a2d3535373639333936:2016/04/22 16\:46\:24.186-COMMANDE-ORDER-201655769396001-]

ERROR [ReadStage:290744] 2016-04-22 17:00:07,556 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in crm.Events.Events_event_type_idx; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:290729] 2016-04-22 17:00:18,708 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in crm.Events.Events_event_type_idx; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:290729] 2016-04-22 17:00:18,709 CassandraDaemon.java (line 258) Exception in thread Thread[ReadStage:290729,5,main]
java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2016)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at    org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:208)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
.
.
.

ERROR [ReadStage:290751] 2016-04-22 17:00:30,771 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in crm.Events.Events_event_type_idx; query aborted (see tombstone_failure_threshold)

设置为：Cassandra 2.0.15,4节点，复制3。此表空间内的数据没有TTL，de gc_grace设置为0。

我们实际上每周都会进行'维护'，包括：

#!/bin/bash

logfile="/var/log/cassandra/maintenance.log"
echo "----------------------------------------" >> $logfile
echo "$(date) Cassandra cluster maintenance started." >> $logfile
echo "----------------------------------------" >> $logfile

nodetool -h localhost setcompactionthroughput 999
echo "$(date)  Cassandra scrub started." >> $logfile
nodetool -h localhost scrub
echo "$(date)  Cassandra scrub completed." >> $logfile
echo "$(date)  Cassandra repair started." >> $logfile
nodetool -h localhost repair --partitioner-range
echo "$(date)  Cassandra repair completed." >> $logfile
echo "$(date)  Cassandra compaction started." >> $logfile
nodetool -h localhost compact
echo "$(date)  Cassandra compaction completed." >> $logfile
echo "$(date)  Cassandra cleanup started." >> $logfile
nodetool -h localhost cleanup
echo "$(date)  Cassandra cleanup completed." >> $logfile

nodetool -h localhost setcompactionthroughput 16

dt=$SECONDS
ds=$((dt % 60))
dm=$(((dt / 60) % 60))
dh=$((dt / 3600))
printf 'Total Run Time : %d:%02d:%02d' $dh $dm $ds >> $logfile

这种“维护”并没有解决问题，我们试图对特定的表空间进行特定的操作但是做得不多。

我们尝试将gc_grace设置为更高的值，然后在维护脚本中休息，但我们得到了相同的结果。我知道这不是一个错误，而是一个保护，以保持Cassandra的良好表现，但我们在这一点模糊。

我们的下一步是转储entiere表空间，删除它然后重新创建，但对于生产中的集群来说似乎有些激进。

有谁知道墓碑清理会出现什么问题？

谢谢，

此致

Answer 1

首先，你的maint脚本有点奇怪。您通常不希望定期运行完整的nodetool compact。 Cassandra的压实策略非常聪明，能够自动完成正确的操作。

也就是说，您的逻辑删除异常位于crm.Events.Events_event_type_idx，它看起来像是事件（event_type）的二级索引。当您插入/更改/删除事件数据时，该索引会构建大量的逻辑删除。当索引中的数据分布与表中的分布不同时，这是次要索引中不太常见（但并非出乎意料）的边缘情况 - cassandra中的二级索引与中等基数一致，但是你有很多特定事件类型。

试图解决这个问题的第一步是{k} /表的nodetool rebuild_index，并希望它清除一些墓碑 - 我怀疑它会。下一步是重新建模您的数据，以便将来不再出现此问题。

Answer 2

已经很久了但是...... 在不同群集上遇到相同问题后，可能会有多种此类问题的来源。

可能是因为： - 您正在以高于实际gc_grace和修复组合的速率删除/修改行。 - 压实无法思考，你的墓碑永远不会被清除。

如果您的数据模型结构良好，通常会出现此错误，大多数时候人们使用Cassandra的方式与典型的关系数据库相同。

就像@JeffJirsa早先说的那样，没有必要进行手动压缩，因为它会停止Cassandra自动进行的压缩。为了让它们再次发生，您必须重新启动节点。

这里的解决方案是降低gc_grace并更频繁地进行修复（应该更经常地发生gc_grace值以避免弹性问题）不幸的是，这不是完全重新构建数据的最佳解决方案。

卡桑德拉墓碑无法清除

2 个答案: