卡桑德拉分片反复损坏:IndexOutOfBounds异常

时间:2019-01-18 00:29:17

标签: cassandra indexoutofboundsexception sharding cassandra-3.0 repair

我们有一个Cassandra 3.11节点集合,其中包含许多不同的表,其中一些表包含大量文本(例如新闻文章,Web帖子等)。由于某种原因,这些商品表经常会损坏并拒绝维修。奇怪的是,并不是我们所有的3节点群集表都无法通过修复,它只是一个节点上一个文章表的一个碎片损坏了。运行这些常见的修复命令通常无法正常工作,因此我们不得不从另一个未损坏节点的节点替换节点表。

示例堆栈跟踪为:

WARN  [ReadStage-2] 2017-10-05 17:05:11,853 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,5,main]: {}
java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at java.util.ArrayList.rangeCheck(ArrayList.java:653) ~[na:1.8.0_141]
        at java.util.ArrayList.get(ArrayList.java:429) ~[na:1.8.0_141]
        at org.apache.cassandra.db.marshal.TupleType.compareCustom(TupleType.java:114) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.marshal.AbstractType.compare(AbstractType.java:160) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.config.ColumnDefinition$1.compare(ColumnDefinition.java:200) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.config.ColumnDefinition$1.compare(ColumnDefinition.java:186) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Cell.lambda$static$0(Cell.java:52) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:384) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.replaceAndSink(MergeIterator.java:263) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:765) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:695) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:672) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:554) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:518) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:500) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:360) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:136) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:92) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:315) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:138) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:134) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_141]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.11.0.jar:3.11.0]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]

我看到它抱怨索引:6,大小:6和其他。虽然从system.log列出它是警告,但如果我进行修复,它会失败并显示错误:

ERROR [ValidationExecutor:30] 2019-01-17 21:05:12,583 Validator.java:268 - Failed creating a merkle tree for [repair #7f300fa0-1a9b-11e9-a3f3-27b394804448 on core_main_application/article_by_date_posted..., /172.31.0.77 (see log for details)

不幸的是,没有任何“细节”可以看到。但是,在修复期间运行跟踪会产生:

[2019-01-17 21:05:12,574] Partition index with 4 entries found for sstable 9123
[2019-01-17 21:05:12,576] /172.31.2.189: Partition index with 2 entries found for sstable 8990
[2019-01-17 21:05:12,576] /172.31.2.189: Partition index with 3 entries found for sstable 8968
[2019-01-17 21:05:12,578] /172.31.2.189: Partition index with 4 entries found for sstable 8985
[2019-01-17 21:05:12,578] /172.31.2.189: Partition index with 0 entries found for sstable 8995
[2019-01-17 21:05:12,587] REPAIR_MESSAGE message received from /172.31.2.189
[2019-01-17 21:05:12,589] /172.31.2.189: Sending REPAIR_MESSAGE message to /172.31.0.77
[2019-01-17 21:05:12,591] Received merkle tree for article_by_date_posted from /172.31.2.189
[2019-01-17 21:05:12,600] Requesting merkle trees for person_by_user_by_date_modified (to [/172.31.2.189, /172.31.10.37, /172.31.0.77])
[2019-01-17 21:05:12,600] Sending REPAIR_MESSAGE message to /172.31.0.77
[2019-01-17 21:05:12,600] REPAIR_MESSAGE message received from /172.31.0.77
[2019-01-17 21:05:12,600] Parsing UPDATE system_distributed.repair_history SET status = 'FAILED', finished_at = toTimestamp(now()), exception_message=?, exception_stacktrace=? WHERE keyspace_name = 'core_main_application' AND columnfamily_name = 'article_by_date_posted' AND id = 7f300fa0-1a9b-11e9-a3f3-27b394804448

以前有没有人遇到过这个问题?

0 个答案:

没有答案