Cassandra - 启用行缓存然后得到了很多" GC暂停超过200毫秒"

时间:2017-11-23 09:27:22

标签: performance cassandra

我的表有大约100万条记录,大多数读取(超过95%),表格模式:

CREATE TABLE ams_table (
    projectid text,
    tagk text,
    tagv text,
    metricid bigint,
    PRIMARY KEY (projectid, tagk, tagv, metricid)
) WITH CLUSTERING ORDER BY (tagk ASC, tagv ASC, metricid ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

读取表非常频繁,请求数量减半,持续时间为70%,造成CPU使用率过高。所以我认为它非常适合使用行缓存。

一个记录有4个字段,不到100个字节,所以我认为整个表不占用超过100MB的内存。

所以,alter table,rows_per_partition to' ALL',缓存所有记录。

ALTER TABLE ams.tbl_tags_with_metricid WITH caching = {'keys': 'ALL', 'rows_per_partition': 'ALL' } ;

修改cassandra.yaml,将row_cache_size_in_mb设置为128M。

row_cache_size_in_mb : 128

nodetool info,命中率接近0.945,直到现在一切都很好。 但很快,我看到CPU使用率更高。 检查system.log,得到大量的GC暂停超过200毫秒。像这样。

WARN  [Native-Transport-Requests-10] 2017-11-22 08:55:39,793 SelectStatement.java:377 - Aggregation query used without partition key
INFO  [Service Thread] 2017-11-22 08:55:40,553 GCInspector.java:284 - ParNew GC in 204ms.  CMS Old Gen: 11955784424 -> 12032443672; Par Eden Space: 671088640 -> 0; Par Survivor Space: 77459760 -> 83886080
INFO  [Service Thread] 2017-11-22 08:55:43,551 GCInspector.java:284 - ParNew GC in 213ms.  CMS Old Gen: 12264670920 -> 12341517200; Par Eden Space: 671088640 -> 0; 
INFO  [Service Thread] 2017-11-22 08:55:45,673 GCInspector.java:284 - ParNew GC in 221ms.  CMS Old Gen: 12354581144 -> 12432569616; Par Eden Space: 671088640 -> 0; 
INFO  [Service Thread] 2017-11-22 08:55:46,186 GCInspector.java:284 - ParNew GC in 217ms.  CMS Old Gen: 12248728592 -> 12343417080; Par Eden Space: 671088640 -> 0; 
INFO  [Service Thread] 2017-11-22 08:55:47,799 GCInspector.java:284 - ParNew GC in 354ms.  CMS Old Gen: 11967866640 -> 12058730544; Par Eden Space: 671088640 -> 0; 
INFO  [Service Thread] 2017-11-22 08:55:48,242 GCInspector.java:284 - ParNew GC in 204ms.  CMS Old Gen: 11940028704 -> 11987653704; Par Eden Space: 671088640 -> 0; 

cassandra 3.9,部署在kubernetes容器中(cpu 8核心,内存32G),可用内存大约10G。

我尝试改变表格,' rows_per_partition'到' 3000',所以" GC暂停超过200毫秒"消失了,但命中率越来越低,CPU使用率与以前相同。

0 个答案:

没有答案