我们看到来自我们的Cassandra(版本2.1.12)群集的频繁慢查询,下面的一个示例来自一个花费近150毫秒(或50百分位数约为5毫秒)的跟踪。运行所有这些查询的数据大多不变,请求量相对较低。查询的键空间如下所示:以下是慢查询日志的两个跟踪:
CREATE TABLE "XXXXXXXXXXXXXXXXXXXXXXX".XXXXXXXXXXXXXXXXXXXXXXX (
field1 text,
field2 text,
field3 boolean,
last_modified bigint,
PRIMARY KEY (field1, field2)
) WITH CLUSTERING ORDER BY (field2 ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
看似随机的操作Seeking to partition beginning in data file
和Enqueuing response to /0.0.0.1
似乎需要超过100毫秒。以前有人见过这种痕迹吗?我可以做些什么来调查更多?
Trace:
Host (queried):XXXX.XXXX.com/0.0.0.1:9042
Host (tried): XXXX.XXXX.com/0.0.0.1:9042
Trace id: ab846ae0-6b95-11e6-ad5d-7d0826b4ac3d
timestamp | source |source_elapsed| Description
-------------------------+----------+--------------+--------------
2016-08-26T14:02:05.326Z | /0.0.0.1 | 33 | reading data from /0.0.0.2
2016-08-26T14:02:05.327Z | /0.0.0.1 | 73 | Sending READ message to /0.0.0.2
2016-08-26T14:02:05.331Z | /0.0.0.2 | 9 | READ message received from /0.0.0.1
2016-08-26T14:02:05.422Z | /0.0.0.2 | 233 | Executing single-partition query on XXXXXXXXXXXXXXXXXXXXXXX
2016-08-26T14:02:05.427Z | /0.0.0.2 | 239 | Acquiring sstable references
2016-08-26T14:02:05.429Z | /0.0.0.2 | 248 | Merging memtable tombstones
2016-08-26T14:02:05.430Z | /0.0.0.2 | 260 | Key cache hit for sstable 459
2016-08-26T14:02:05.431Z | /0.0.0.2 | 261 | Seeking to partition beginning in data file
2016-08-26T14:02:05.445Z | /0.0.0.2 | 102921 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
2016-08-26T14:02:05.446Z | /0.0.0.2 | 102923 | Merging data from memtables and 1 sstables
2016-08-26T14:02:05.447Z | /0.0.0.2 | 107862 | Read 501 live and 0 tombstone cells
2016-08-26T14:02:05.447Z | /0.0.0.2 | 107902 | Enqueuing response to /0.0.0.1
2016-08-26T14:02:05.466Z | /0.0.0.2 | 134315 | Sending REQUEST_RESPONSE message to /0.0.0.1
2016-08-26T14:02:05.473Z | /0.0.0.1 | 147284 | REQUEST_RESPONSE message received from /0.0.0.2
2016-08-26T14:02:05.474Z | /0.0.0.1 | 147302 | Processing response from /0.0.0.2
Trace:
Host (queried):xxxxxx.xxxx.com/0.0.0.1:9042
Host (tried): xxxxxxx.xxxx.com/0.0.0.1:9042
Trace id: 9930d290-6b9d-11e6-ae36-ddb3c9ba6cb9
timestamp | source |source_elapsed| Description
-------------------------+----------+--------------+--------------
2016-08-26T14:58:50.553Z | /0.0.0.2 | 8 | READ message received from /0.0.0.1
2016-08-26T14:58:50.556Z | /0.0.0.1 | 39 | reading data from /0.0.0.2
2016-08-26T14:58:50.557Z | /0.0.0.1 | 89 | Sending READ message to /0.0.0.2
2016-08-26T14:58:50.563Z | /0.0.0.1 | 10186 | speculating read retry on XXXXXX.XXXXX.com/0.0.0.1
2016-08-26T14:58:50.565Z | /0.0.0.1 | 10246 | Sending READ message to XXXXXX.XXXX.com/0.0.0.1
2016-08-26T14:58:50.570Z | /0.0.0.1 | 10322 | READ message received from /0.0.0.1
2016-08-26T14:58:50.593Z | /0.0.0.2 | 39803 | Executing single-partition query on XXXXXXXXXXXXXXXXXXXXXXX
2016-08-26T14:58:50.600Z | /0.0.0.2 | 39894 | Acquiring sstable references
2016-08-26T14:58:50.600Z | /0.0.0.2 | 39898 | Merging memtable tombstones
2016-08-26T14:58:50.601Z | /0.0.0.2 | 39908 | Key cache hit for sstable 495
2016-08-26T14:58:50.601Z | /0.0.0.2 | 39909 | Seeking to partition beginning in data file
2016-08-26T14:58:50.602Z | /0.0.0.2 | 42002 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
2016-08-26T14:58:50.603Z | /0.0.0.2 | 42004 | Merging data from memtables and 1 sstables
2016-08-26T14:58:50.604Z | /0.0.0.2 | 49181 | Read 501 live and 0 tombstone cells
2016-08-26T14:58:50.604Z | /0.0.0.2 | 49207 | Enqueuing response to /0.0.0.1
2016-08-26T14:58:50.646Z | /0.0.0.1 | 92529 | Executing single-partition query on XXXXXXXXXXXXXXXXXXXXXXX
2016-08-26T14:58:50.648Z | /0.0.0.1 | 92642 | Acquiring sstable references
2016-08-26T14:58:50.649Z | /0.0.0.1 | 92646 | Merging memtable tombstones
2016-08-26T14:58:50.650Z | /0.0.0.1 | 92656 | Key cache hit for sstable 489
2016-08-26T14:58:50.651Z | /0.0.0.1 | 92657 | Seeking to partition beginning in data file
2016-08-26T14:58:50.651Z | /0.0.0.1 | 92845 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
2016-08-26T14:58:50.652Z | /0.0.0.1 | 92847 | Merging data from memtables and 1 sstables
2016-08-26T14:58:50.652Z | /0.0.0.1 | 98457 | Read 501 live and 0 tombstone cells
2016-08-26T14:58:50.653Z | /0.0.0.1 | 98490 | Enqueuing response to /0.0.0.1
2016-08-26T14:58:50.762Z | /0.0.0.1 | 209166 | Sending REQUEST_RESPONSE message to XXXX.XXXXX.com/0.0.0.1
2016-08-26T14:58:50.791Z | /0.0.0.1 | 234723 | REQUEST_RESPONSE message received from /0.0.0.1
2016-08-26T14:58:50.794Z | /0.0.0.1 | 234778 | Processing response from /0.0.0.1
2016-08-26T14:58:50.917Z | /0.0.0.2 | 363508 | Sending REQUEST_RESPONSE message to /0.0.0.1
这是一个相当大的集群(15个节点)集群,但节点的延迟时间如下所示:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 2.24 129.03 174.26 4.75 15355.69 1069.99 91.76 0.14 0.77 0.47 8.43
sdb 2.39 128.91 172.31 4.72 15189.18 1068.77 91.84 0.14 0.77 0.47 8.38
sde 2.06 126.19 172.95 4.60 15199.89 1046.10 91.50 0.14 0.77 0.47 8.41
sda 6.60 124.75 3.75 14.99 268.23 1100.15 72.99 0.02 0.86 0.52 0.98
sdd 2.27 124.49 170.66 4.54 15029.25 1032.00 91.67 0.13 0.76 0.47 8.32
dm-0 0.00 0.00 8.10 4.31 64.80 34.46 8.00 0.03 2.52 0.22 0.27
dm-1 0.00 0.00 2.26 132.38 203.39 1059.02 9.38 0.06 0.46 0.05 0.70
dm-2 0.00 0.00 175.19 130.76 15199.88 1046.10 53.10 0.11 0.34 0.28 8.42
dm-3 0.00 0.00 173.13 129.00 15029.24 1032.00 53.16 0.10 0.33 0.28 8.32
dm-4 0.00 0.00 176.70 133.75 15355.69 1069.99 52.91 0.11 0.34 0.27 8.44
dm-5 0.00 0.00 174.90 133.60 15189.17 1068.77 52.70 0.11 0.34 0.27 8.38
dm-6 0.00 0.00 0.00 0.83 0.01 6.66 8.00 0.00 0.36 0.21 0.02
更新:gc日志看起来有点涉及很多(每分钟约40次)10ms +每分钟暂停
2016-08-29T08:40:28.194-0500: 3267605.189: Total time for which application threads were stopped: 0.0225500 seconds, Stopping threads took: 0.0001925 seconds
2016-08-29T08:40:28.224-0500: 3267605.218: Total time for which application threads were stopped: 0.0282851 seconds, Stopping threads took: 0.0002292 seconds
2016-08-29T08:40:30.313-0500: 3267607.308: Total time for which application threads were stopped: 0.0228544 seconds, Stopping threads took: 0.0003854 seconds
对过去10小时内的暂停列表进行了排序,其中有一些相当大的暂停。
2016-08-29T00:10:58.821-0500: 3237035.816: Total time for which application threads were stopped: 0.3012112 seconds, Stopping threads took: 0.0002029 seconds
2016-08-29T06:45:12.657-0500: 3260689.652: Total time for which application threads were stopped: 0.3115931 seconds, Stopping threads took: 0.0002821 seconds
2016-08-29T00:14:05.523-0500: 3237222.518: Total time for which application threads were stopped: 0.3314997 seconds, Stopping threads took: 0.0002298 seconds
2016-08-29T08:30:36.023-0500: 3267013.017: Total time for which application threads were stopped: 1.3173462 seconds, Stopping threads took: 0.0002041 seconds