Question

我在使用Cassandra 2.1.5时表现不佳。我是新手，所以对任何有关如何调试的建议表示感谢。这是我的表格的样子：

Keyspace: nt_live_october                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Count: 6                                                                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Latency: 20837.149166666666 ms.                                                                                                                                                                 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Count: 39799                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Latency: 0.45696595391844014 ms.                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Pending Flushes: 0                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Table: nt                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable count: 12                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (live): 15903191275                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (total): 15971044770                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used by snapshots (total): 0                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Off heap memory used (total): 14468424                                                                                                                                                       x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable Compression Ratio: 0.1308103413354315                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Number of keys (estimate): 740                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable cell count: 43483                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable data size: 9272510                                                                                                                                                                  x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable off heap memory used: 0                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable switch count: 17                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read count: 6                                                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read latency: 20837.150 ms                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write count: 39801                                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write latency: 0.457 ms                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Pending flushes: 0                                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false positives: 0                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false ratio: 0.00000                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter space used: 4832                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter off heap memory used: 4736                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Index summary off heap memory used: 576                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compression metadata off heap memory used: 14463112                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition minimum bytes: 6867                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition maximum bytes: 30753941057                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition mean bytes: 44147544                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average tombstones per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum tombstones per slice (last five minutes): 0.0

我通过cqlsh发出以下查询：

cassandra@cqlsh> TRACING ON;                                                                                                                                                                                          Tracing is already enabled. Use TRACING OFF to disable.                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY;                                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE.                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1;                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds

以下是system_traces.events显示的内容：

xxx.xxx.xxx.203 | 1281 |从nt_live_october.nt解析select *，其中group_id =＆＃39; 254358＆＃39; \ nand epoch＆gt; = 1443916800和epoch＆lt; = 1444348800 \ nLIMIT 30;
   xxx.xxx.xxx.203 | 2604 |准备声明
   xxx.xxx.xxx.203 | 8454 |对用户执行单分区查询    xxx.xxx.xxx.203 | 8474 |获取稳定的参考文献
   xxx.xxx.xxx.203 | 8547 |合并可记忆的墓碑
   xxx.xxx.xxx.203 | 8675 | sstable的关键缓存命中1    xxx.xxx.xxx.203 | 8685 |寻求在数据文件中开始分区
   xxx.xxx.xxx.203 | 9040 |跳过0/1非切片相交的sstables，由于墓碑而包括0    xxx.xxx.xxx.203 | 9056 |合并来自memtables和1 sstables的数据
   xxx.xxx.xxx.203 | 9120 |阅读1个实时和0个墓碑单元格    xxx.xxx.xxx.203 | 9854 |读修DC_LOCAL
   xxx.xxx.xxx.203 | 10033 |对用户执行单分区查询    xxx.xxx.xxx.203 | 10046 |获取稳定的参考文献
   xxx.xxx.xxx.203 | 10105 |合并可记忆的墓碑
   xxx.xxx.xxx.203 | 10189 | sstable的关键缓存命中1    xxx.xxx.xxx.203 | 10198 |寻求在数据文件中开始分区
   xxx.xxx.xxx.203 | 10248 |跳过0/1非切片相交的sstables，由于墓碑而包括0    xxx.xxx.xxx.203 | 10261 |合并来自memtables和1 sstables的数据
   xxx.xxx.xxx.203 | 10296 |阅读1个实时和0个墓碑单元格    xxx.xxx.xxx.203 | 12511 |在nt上执行单分区查询    xxx.xxx.xxx.203 | 12525 |获取稳定的参考文献
   xxx.xxx.xxx.203 | 12587 |合并可记忆的墓碑
   xxx.xxx.xxx.203 | 18067 |在/xxx.xxx.xxx.205上推测读重试    xxx.xxx.xxx.203 | 18577 |将READ消息发送到xxx.xxx.xxx.205 / xxx.xxx.xxx.205
   xxx.xxx.xxx.203 | 25534 |分辨率指数为sstable 8885发现6093个条目    xxx.xxx.xxx.203 | 25571 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.203 | 34989 |找到sstable 8524的5327条目的分区索引
   xxx.xxx.xxx.203 | 35022 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.203 | 36322 |找到sstable 8477的333个条目的分区索引
   xxx.xxx.xxx.203 | 36336 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.203 | 714242 |找到2948251个条目的分区索引为sstable 8541
   xxx.xxx.xxx.203 | 714279 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.203 | 715717 |找到sstable 8217的501条目的分区索引
   xxx.xxx.xxx.203 | 715745 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.203 | 716232 |找到252个条目的分区索引为sstable 8888
   xxx.xxx.xxx.203 | 716245 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.205 | 87 |从/xxx.xxx.xxx.203收到的READ消息    xxx.xxx.xxx.205 | 50427 |在nt上执行单分区查询    xxx.xxx.xxx.205 | 50535 |获取稳定的参考文献
   xxx.xxx.xxx.205 | 50628 |合并可记忆的墓碑
   xxx.xxx.xxx.205 | 170441 |找到35650个条目的分区索引sstable 6332
   xxx.xxx.xxx.203 | 30718026 |找到sstable 5958的199905个条目的分区索引
   xxx.xxx.xxx.203 | 30718077 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.205 | 170499 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.205 | 248898 |找到sstable 6797的30958条目的分区索引
   xxx.xxx.xxx.205 | 248962 |寻求分区数据文件中的索引部分
   xxx.xxx.xxx.203 | 67814573 |读取超时：org.apache.cassandra.exceptions.ReadTimeoutException：操作超时 - 仅收到0个响应。
   xxx.xxx.xxx.203 | 67814675 |时间到;收到0回复1

我有4个节点，复制因子为3（一个节点非常轻，但它不是.203）我试图读取的数据不是很多 - 即使LIMIT 1没有被推送到远程节点，间隔的低端应该是大约3小时前（我没有超过当前时间的纪元）

有关如何解决此问题/可能出错的任何提示？我的cassandra版本是2.1.9，主要使用默认值

运行

表模式如下（出于隐私原因我不能发布整个模式，但是显示我希望主要关键的密钥）

PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

___________ EDIT_____________ 回答以下问题：

状态输出：

--  Address         Load       Tokens  Owns    Host ID                               Rack                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN  xxx.xxx.xxx.204  15.8 GB    1       ?       32ed196b-f6eb-4e93-b759  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.205  20.38 GB   1       ?       446d71aa-e9cd-4ca9-a6ac  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.202  1.48 GB    1       ?       2a6670b2-63f2-43be-b672  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.203  15.72 GB   1       ?       dd26dfee-82da-454b-8db2  r1

system.log比较复杂，因为我有很多登录...我看到的一个可疑的事情是

 WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)

但在我看到

之后不久就发出了警告

 INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,].  35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s.  21 total partitions merged to 18.  Partition merge counts were {1:17, 4:1, }

我在日志中看到了很多这样的对......但是没有ERROR级别的消息。压缩似乎没问题。确实说这是最大的列系列，但所有消息都是INFO级别....

Answer 1

首先，节点204的DN状态意味着关闭。检索其system.log并查找：

例外和错误级别日志
异常GC活动（收集时间超过200毫秒）
StatusLogger

其次，数据在集群中分布很差。 202的负载仅为1.48 GB。我怀疑你在其他节点上复制了一些非常大的分区。什么是复制因子？你的密钥空间的方案是什么？您可以使用cqlsh命令回答这些问题：

DESCRIBE KEYSPACE nt_live_october;

无法跟踪cassandra查询

1 个答案: