Question

我看到一些我不太了解的慢查询。

表格如下：

CREATE TABLE tbl (
    key text,
    time timestamp,
    id uuid,
    data int,
    PRIMARY KEY (key, time, id)
) WITH CLUSTERING ORDER BY (time ASC, id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

跟踪看起来像：

 activity                                                                                                                                                       | timestamp                  | source       | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
                                                                                                                                             Execute CQL3 query | 2016-09-28 16:33:51.821000 | <same ip> |              0
                                              Parsing SELECT * FROM tbl WHERE key = '3-069' AND time <= <30 minutes in the past> LIMIT 1; [SharedPool-Worker-4] | 2016-09-28 16:33:51.821000 | <same ip> |             79
                                                                                                                      Preparing statement [SharedPool-Worker-4] | 2016-09-28 16:33:51.821000 | <same ip> |            186
                                                                                                  Executing single-partition query on tbl [SharedPool-Worker-5] | 2016-09-28 16:33:51.822000 | <same ip> |            661
                                                                                                             Acquiring sstable references [SharedPool-Worker-5] | 2016-09-28 16:33:51.822000 | <same ip> |            704
                                                                                                              Merging memtable tombstones [SharedPool-Worker-5] | 2016-09-28 16:33:51.822000 | <same ip> |            717
                                                                                                           Key cache hit for sstable 2873 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |            750
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |            759
                                                                                                           Key cache hit for sstable 2872 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |            887
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |            895
                                                                                                           Key cache hit for sstable 2867 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |            992
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |            999
                                                                                                           Key cache hit for sstable 2854 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |           1115
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |           1132
                                                                                                           Key cache hit for sstable 2841 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |           1243
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |           1252
                                                                                                           Key cache hit for sstable 2828 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |           1340
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822001 | <same ip> |           1348
                                                                                                           Key cache hit for sstable 2771 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822002 | <same ip> |           1463
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822002 | <same ip> |           1470
                                                                                                           Key cache hit for sstable 2562 [SharedPool-Worker-5] | 2016-09-28 16:33:51.822002 | <same ip> |           1577
                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-5] | 2016-09-28 16:33:51.822002 | <same ip> |           1585
                                                                Skipped 0/8 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-5] | 2016-09-28 16:33:51.823000 | <same ip> |           1705
                                                                                               Merging data from memtables and 8 sstables [SharedPool-Worker-5] | 2016-09-28 16:33:51.823000 | <same ip> |           1715
                                                                                                        Read 2 live and 0 tombstone cells [SharedPool-Worker-5] | 2016-09-28 16:33:55.652000 | <same ip> |         831025
                                                                                                                                               Request complete | 2016-09-28 16:33:55.717105 | <same ip> |         896105

此主键只有11个单元格，此查询返回了一个单元格。任何人都可以解释为什么没有任何墓碑的少量细胞的读取是如此之慢？我还应该关注其他一些指标吗？机器上的CPU和磁盘利用率看起来很好，GC时间相当稳定且很低。

Answer 1

您想要观看的其他指标是您想知道您正在使用的每次读取的SSTable数量的表格。如果你有太多会减慢事情，这是其他答案所要达到的。

您可以通过nodetool tablehistograms获取该数据。

"{}"

这将为您提供有关针对不同请求命中的sstables数量的细分。如果您正在使用opscenter，也可以在opscenter中查看此数据。寻找nodetool tablehistograms keyspace tbl

Answer 2

我使用以下脚本（使用CCM）重现了您的场景

ccm create cas-1 --vnodes -n 1 -v 2.1.15
ccm start
echo "create keyspace test WITH REPLICATION={ 'class' : 'SimpleStrategy', 'replication_factor' : 1} ;" | ccm node1 cqlsh
echo "create table test.tbl (key text,time timestamp,id uuid,data int,PRIMARY KEY (key, time, id));" | ccm node1 cqlsh
echo "insert into test.tbl (key,time,id,data) values ('1','2000-1-1',now(),1);" | ccm node1 cqlsh
ccm node1 nodetool flush
echo "insert into test.tbl (key,time,id,data) values ('1','2000-2-1',now(),1);" | ccm node1 cqlsh
ccm node1 nodetool flush 
echo "insert into test.tbl (key,time,id,data) values ('1','2000-3-1',now(),1);" | ccm node1 cqlsh
ccm node1 nodetool flush 

echo "tracing on;  select * from test.tbl where key='1' and time <= '2000-3-1' limit 1;" | ccm node1 cqlsh
echo "tracing on;  select * from test.tbl where key='1' and time <= '2000-3-1' limit 1;" | ccm node1 cqlsh

产生类似的痕迹。

Now Tracing is enabled

 key | time                     | id                                   | data
-----+--------------------------+--------------------------------------+------
   1 | 1999-12-31 22:00:00+0000 | ac0791c0-85b9-11e6-9005-51c5fe8b2280 |    1

(1 rows)

Tracing session: ae3ebd10-85b9-11e6-9005-51c5fe8b2280

 activity                                                                                           | timestamp                  | source    | source_elapsed
----------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                 Execute CQL3 query | 2016-09-28 23:25:22.018000 | 127.0.0.1 |              0
 Parsing select * from test.tbl where key='1' and time <= '2000-3-1' limit 1; [SharedPool-Worker-3] | 2016-09-28 23:25:22.019000 | 127.0.0.1 |            570
                                                          Preparing statement [SharedPool-Worker-3] | 2016-09-28 23:25:22.020000 | 127.0.0.1 |           1055
                                      Executing single-partition query on tbl [SharedPool-Worker-1] | 2016-09-28 23:25:22.022000 | 127.0.0.1 |           4065
                                                 Acquiring sstable references [SharedPool-Worker-1] | 2016-09-28 23:25:22.023000 | 127.0.0.1 |           4091
                                                  Merging memtable tombstones [SharedPool-Worker-1] | 2016-09-28 23:25:22.024000 | 127.0.0.1 |           4132
                           Partition index with 0 entries found for sstable 3 [SharedPool-Worker-1] | 2016-09-28 23:25:22.024000 | 127.0.0.1 |           4388
                                  Seeking to partition beginning in data file [SharedPool-Worker-1] | 2016-09-28 23:25:22.024000 | 127.0.0.1 |           4398
                           Partition index with 0 entries found for sstable 2 [SharedPool-Worker-1] | 2016-09-28 23:25:22.025000 | 127.0.0.1 |           4761
                                  Seeking to partition beginning in data file [SharedPool-Worker-1] | 2016-09-28 23:25:22.025000 | 127.0.0.1 |           4770
                           Partition index with 0 entries found for sstable 1 [SharedPool-Worker-1] | 2016-09-28 23:25:22.025000 | 127.0.0.1 |           4991
                                  Seeking to partition beginning in data file [SharedPool-Worker-1] | 2016-09-28 23:25:22.025000 | 127.0.0.1 |           5000
    Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2016-09-28 23:25:22.026000 | 127.0.0.1 |           5148
                                   Merging data from memtables and 3 sstables [SharedPool-Worker-1] | 2016-09-28 23:25:22.026000 | 127.0.0.1 |           5159
                                            Read 2 live and 0 tombstone cells [SharedPool-Worker-1] | 2016-09-28 23:25:22.027000 | 127.0.0.1 |           5365
                                                                                   Request complete | 2016-09-28 23:25:22.023661 | 127.0.0.1 |           5661

您正在查询的密钥（分区密钥）（在我的示例中＆＃39; 1＆＃39;）在您的情况下＆＃39; 3＆＃39;有多个sstable文件中的数据（在上面我通过在每次插入后刷新sstables强制执行此操作）。

由于查询是通过群集密钥过滤并使用＆＃34;限制1＆＃34;需要搜索保存分区键数据的所有sstables。检索完所有行后，将对它们进行排序，并返回第一个结果。

如果您要删除＆＃34;限制1＆＃34;你应该得到多个结果

在我的样本中

echo "select * from test.tbl where key='1' and time <= '2000-3-1';" | ccm node1 cqlsh
Now Tracing is enabled

 key | time                     | id                                   | data
-----+--------------------------+--------------------------------------+------
   1 | 1999-12-31 22:00:00+0000 | ac0791c0-85b9-11e6-9005-51c5fe8b2280 |    1
   1 | 2000-01-31 22:00:00+0000 | acd60550-85b9-11e6-9005-51c5fe8b2280 |    1
   1 | 2000-02-29 22:00:00+0000 | ad8a3a20-85b9-11e6-9005-51c5fe8b2280 |    1

(3 rows)

Answer 3

您似乎将分区拆分为8个不同的SSTable：

Merging data from memtables and 8 sstables [SharedPool-Worker-5] | 2016-09-28 16:33:51.823000 | <same ip> |           1715

如果您使用慢速旋转磁盘（例如7200 RPM），这会直接转换为多个磁盘搜索：您的查询受到磁盘子系统可以提供的IOPS量的限制。

要缓解此问题，您可以尝试将所有SSTable合并为更大的SSTable。这确实只会引起一次寻求。根据您的压缩策略设置（检查the documentation以进行微调STCS），您可以使用nodetool compact发出主要压缩。检查这个问题：Does nodetool compact move everything into one SSTable

如果您不想运行压缩（可能需要很长时间，或者无法完全工作），您可以尝试将CF移动到带有符号链接的SSD（停止节点，复制数据，对目录进行符号链接），开始节点）。 SSD具有比旋转磁盘更多的IOPS，您应该立即看到效果。

HTH。

诊断缓慢的Cassandra查询

3 个答案: