如果经过很长一段时间后,STCS产生了一个非常大的SSTable,后来我们收到了一个只存在于那个大SSTable中的分区键的读取请求(即它在该表的所有SSTable中是唯一的),因为我们正在处理一个大的SSTable,或者读取延迟是否不受分区索引大小的影响会增加读延迟吗?
另一方面,我认为在分区摘要的帮助下,然后使用带有指针的分区索引只有一个大的SSTable仍然比寻找更多更小的SSTable更好。
答案 0 :(得分:2)
首先,Cassandra进程有一个分区密钥缓存实例,它由所有SSTable和所有表共享。其大小限制在cassandra.yaml
# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)).
# Set to 0 to disable key cache.
key_cache_size_in_mb:
对于用于执行二进制搜索以找到最近的扫描分区偏移的索引摘要,通常我们会对每128个分区键进行采样,但对于具有大量分区键的SSTable,此采样可以增加以节省内存。
CREATE TABLE music.example (
id int PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
...
AND max_index_interval = 2048
AND min_index_interval = 128
...;
可以在cassandra.yaml
# A fixed memory pool size in MB for for SSTable index summaries. If left
# empty, this will default to 5% of the heap size. If the memory usage of
# all index summaries exceeds this limit, SSTables with low read rates will
# shrink their index summaries in order to meet this limit. However, this
# is a best-effort process. In extreme conditions Cassandra may need to use
# more than this amount of memory.
index_summary_capacity_in_mb:
# How frequently index summaries should be resampled. This is done
# periodically to redistribute memory from the fixed-size pool to sstables
# proportional their recent read rates. Setting to -1 will disable this
# process, leaving existing index summaries at their current sampling level.
index_summary_resize_interval_in_minutes: 60
请参阅 CASSANDRA-6379 所以回答你的问题,大SSTable的读取性能: