我在使用Cassandra 2.1.5时表现不佳。我是新手,所以对任何有关如何调试的建议表示感谢。这是我的表格的样子:
Keyspace: nt_live_october x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read Count: 6 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read Latency: 20837.149166666666 ms. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write Count: 39799 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write Latency: 0.45696595391844014 ms. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pending Flushes: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Table: nt x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSTable count: 12 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used (live): 15903191275 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used (total): 15971044770 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used by snapshots (total): 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Off heap memory used (total): 14468424 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSTable Compression Ratio: 0.1308103413354315 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of keys (estimate): 740 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable cell count: 43483 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable data size: 9272510 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable off heap memory used: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable switch count: 17 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local read count: 6 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local read latency: 20837.150 ms x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local write count: 39801 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local write latency: 0.457 ms x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pending flushes: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter false positives: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter false ratio: 0.00000 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter space used: 4832 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter off heap memory used: 4736 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index summary off heap memory used: 576 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compression metadata off heap memory used: 14463112 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition minimum bytes: 6867 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition maximum bytes: 30753941057 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition mean bytes: 44147544 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Average live cells per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maximum live cells per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Average tombstones per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maximum tombstones per slice (last five minutes): 0.0
我通过cqlsh发出以下查询:
cassandra@cqlsh> TRACING ON; Tracing is already enabled. Use TRACING OFF to disable. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY; x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1; x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds
以下是system_traces.events显示的内容:
xxx.xxx.xxx.203 | 1281 |从nt_live_october.nt解析select *,其中group_id =&#39; 254358&#39; \ nand epoch&gt; = 1443916800和epoch&lt; = 1444348800 \ nLIMIT 30;
xxx.xxx.xxx.203 | 2604 |准备声明
xxx.xxx.xxx.203 | 8454 |对用户执行单分区查询 xxx.xxx.xxx.203 | 8474 |获取稳定的参考文献
xxx.xxx.xxx.203 | 8547 |合并可记忆的墓碑
xxx.xxx.xxx.203 | 8675 | sstable的关键缓存命中1 xxx.xxx.xxx.203 | 8685 |寻求在数据文件中开始分区
xxx.xxx.xxx.203 | 9040 |跳过0/1非切片相交的sstables,由于墓碑而包括0 xxx.xxx.xxx.203 | 9056 |合并来自memtables和1 sstables的数据
xxx.xxx.xxx.203 | 9120 |阅读1个实时和0个墓碑单元格 xxx.xxx.xxx.203 | 9854 |读修DC_LOCAL
xxx.xxx.xxx.203 | 10033 |对用户执行单分区查询 xxx.xxx.xxx.203 | 10046 |获取稳定的参考文献
xxx.xxx.xxx.203 | 10105 |合并可记忆的墓碑
xxx.xxx.xxx.203 | 10189 | sstable的关键缓存命中1 xxx.xxx.xxx.203 | 10198 |寻求在数据文件中开始分区
xxx.xxx.xxx.203 | 10248 |跳过0/1非切片相交的sstables,由于墓碑而包括0 xxx.xxx.xxx.203 | 10261 |合并来自memtables和1 sstables的数据
xxx.xxx.xxx.203 | 10296 |阅读1个实时和0个墓碑单元格 xxx.xxx.xxx.203 | 12511 |在nt上执行单分区查询 xxx.xxx.xxx.203 | 12525 |获取稳定的参考文献
xxx.xxx.xxx.203 | 12587 |合并可记忆的墓碑
xxx.xxx.xxx.203 | 18067 |在/xxx.xxx.xxx.205上推测读重试 xxx.xxx.xxx.203 | 18577 |将READ消息发送到xxx.xxx.xxx.205 / xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 25534 |分辨率指数为sstable 8885发现6093个条目 xxx.xxx.xxx.203 | 25571 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.203 | 34989 |找到sstable 8524的5327条目的分区索引
xxx.xxx.xxx.203 | 35022 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.203 | 36322 |找到sstable 8477的333个条目的分区索引
xxx.xxx.xxx.203 | 36336 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.203 | 714242 |找到2948251个条目的分区索引为sstable 8541
xxx.xxx.xxx.203 | 714279 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.203 | 715717 |找到sstable 8217的501条目的分区索引
xxx.xxx.xxx.203 | 715745 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.203 | 716232 |找到252个条目的分区索引为sstable 8888
xxx.xxx.xxx.203 | 716245 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.205 | 87 |从/xxx.xxx.xxx.203收到的READ消息 xxx.xxx.xxx.205 | 50427 |在nt上执行单分区查询 xxx.xxx.xxx.205 | 50535 |获取稳定的参考文献
xxx.xxx.xxx.205 | 50628 |合并可记忆的墓碑
xxx.xxx.xxx.205 | 170441 |找到35650个条目的分区索引sstable 6332
xxx.xxx.xxx.203 | 30718026 |找到sstable 5958的199905个条目的分区索引
xxx.xxx.xxx.203 | 30718077 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.205 | 170499 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.205 | 248898 |找到sstable 6797的30958条目的分区索引
xxx.xxx.xxx.205 | 248962 |寻求分区数据文件中的索引部分
xxx.xxx.xxx.203 | 67814573 |读取超时:org.apache.cassandra.exceptions.ReadTimeoutException:操作超时 - 仅收到0个响应。
xxx.xxx.xxx.203 | 67814675 |时间到;收到0回复1
我有4个节点,复制因子为3(一个节点非常轻,但它不是.203)我试图读取的数据不是很多 - 即使LIMIT 1没有被推送到远程节点,间隔的低端应该是大约3小时前(我没有超过当前时间的纪元)
有关如何解决此问题/可能出错的任何提示?我的cassandra版本是2.1.9,主要使用默认值
运行表模式如下(出于隐私原因我不能发布整个模式,但是显示我希望主要关键的密钥)
PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
___________ EDIT_____________ 回答以下问题:
状态输出:
-- Address Load Tokens Owns Host ID Rack x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN xxx.xxx.xxx.204 15.8 GB 1 ? 32ed196b-f6eb-4e93-b759 r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.205 20.38 GB 1 ? 446d71aa-e9cd-4ca9-a6ac r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.202 1.48 GB 1 ? 2a6670b2-63f2-43be-b672 r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.203 15.72 GB 1 ? dd26dfee-82da-454b-8db2 r1
system.log比较复杂,因为我有很多登录...我看到的一个可疑的事情是
WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)
但在我看到
之后不久就发出了警告 INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,]. 35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s. 21 total partitions merged to 18. Partition merge counts were {1:17, 4:1, }
我在日志中看到了很多这样的对......但是没有ERROR级别的消息。压缩似乎没问题。确实说这是最大的列系列,但所有消息都是INFO级别....
答案 0 :(得分:2)
首先,节点204的DN状态意味着关闭。检索其system.log并查找:
其次,数据在集群中分布很差。 202的负载仅为1.48 GB。我怀疑你在其他节点上复制了一些非常大的分区。什么是复制因子?你的密钥空间的方案是什么?您可以使用cqlsh命令回答这些问题:
DESCRIBE KEYSPACE nt_live_october;