Question

In Production cluster , the Cluster Write latency frequently spikes from 7ms to 4Sec. Due to this clients face a lot of Read and Write Timeouts. This repeats in every few hours.

Observation: Cluster Write latency (99th percentile) - 4Sec Local Write latency (99th percentile) - 10ms Read & Write consistency - local_one Total nodes - 7

I tried to enable trace using settraceprobability for few mins and observed that mostly of the time is taken in internode communication

 session_id                           | event_id                             | activity                                                                                                                    | source        | source_elapsed | thread
--------------------------------------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+---------------+----------------+------------------------------------------
 4267dca2-bb79-11e8-aeca-439c84a4762c | 429c3314-bb79-11e8-aeca-439c84a4762c | Parsing  SELECT * FROM table1 WHERE uaid = '506a5f3b' AND messageid >= '01;'  | cassandranode3 |              7 |                     SharedPool-Worker-47
 4267dca2-bb79-11e8-aeca-439c84a4762c | 429c5a20-bb79-11e8-aeca-439c84a4762c |                                                                                                         Preparing statement | Cassandranode3 |             47 |                     SharedPool-Worker-47
 4267dca2-bb79-11e8-aeca-439c84a4762c | 429c5a21-bb79-11e8-aeca-439c84a4762c |                                                                                            reading data from /Cassandranode1 | Cassandranode3 |            121 |                     SharedPool-Worker-47
 4267dca2-bb79-11e8-aeca-439c84a4762c | 42a38610-bb79-11e8-aeca-439c84a4762c |                                                                       REQUEST_RESPONSE message received from /cassandranode1 | cassandranode3 |          40614 | MessagingService-Incoming-/Cassandranode1
 4267dca2-bb79-11e8-aeca-439c84a4762c | 42a38611-bb79-11e8-aeca-439c84a4762c |                                                                                     Processing response from /Cassandranode1 | Cassandranode3 |          40626 |                      SharedPool-Worker-5

I tried checking the connectivity between Cassandra nodes but did not see any issues. Cassandra logs are flooded with Read timeout exceptions as this is a pretty busy cluster with 30k reads/sec and 10k writes/sec.

Warning in the system.log:

WARN  [SharedPool-Worker-28] 2018-09-19 01:39:16,999 SliceQueryFilter.java:320 - Read 122 live and 266 tombstone cells in system.schema_columns for key: system (see tombstone_warn_threshold). 2147483593 columns were requested, slices=[-]

During the spike the cluster just stalls and simple commands like "use system_traces" command also fails.

cassandra@cqlsh:system_traces> select * from sessions ;
Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers.
Schema metadata was not refreshed. See log for details.

I validated the schema versions on all nodes and its the same but looks like during the issue time Cassandra is not even able to read the metadata.

Has anyone faced similar issues ? any suggestions ?

Answer 1

（来自以上注释的数据）长时间的gc暂停肯定会导致此情况。添加-XX:+DisableExplicitGC会由于对system.gc的调用而获得了完整的GC，这很可能是由于愚蠢的DGC rmi东西引起的，该东西会定期调用，无论是否需要。随着更大的堆是非常昂贵的。禁用是安全的。

检查您的gc日志标题，确保未设置最小堆大小。我建议设置-XX:G1ReservePercent=20

Frequent Spikes in Cassandra write latency

1 个答案: