Question

如果经过很长一段时间后，STCS产生了一个非常大的SSTable，后来我们收到了一个只存在于那个大SSTable中的分区键的读取请求（即它在该表的所有SSTable中是唯一的），因为我们正在处理一个大的SSTable，或者读取延迟是否不受分区索引大小的影响会增加读延迟吗？

另一方面，我认为在分区摘要的帮助下，然后使用带有指针的分区索引只有一个大的SSTable仍然比寻找更多更小的SSTable更好。

Answer 1

首先，Cassandra进程有一个分区密钥缓存实例，它由所有SSTable和所有表共享。其大小限制在cassandra.yaml

中定义

# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)). 
# Set to 0 to disable key cache.
key_cache_size_in_mb:

对于用于执行二进制搜索以找到最近的扫描分区偏移的索引摘要，通常我们会对每128个分区键进行采样，但对于具有大量分区键的SSTable，此采样可以增加以节省内存。

CREATE TABLE music.example (
    id int PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    ...
    AND max_index_interval = 2048
    AND min_index_interval = 128
    ...;

可以在cassandra.yaml

中配置索引摘要的总内存使用量

# A fixed memory pool size in MB for for SSTable index summaries. If left
# empty, this will default to 5% of the heap size. If the memory usage of
# all index summaries exceeds this limit, SSTables with low read rates will
# shrink their index summaries in order to meet this limit.  However, this
# is a best-effort process. In extreme conditions Cassandra may need to use
# more than this amount of memory.
index_summary_capacity_in_mb:

# How frequently index summaries should be resampled.  This is done
# periodically to redistribute memory from the fixed-size pool to sstables
# proportional their recent read rates.  Setting to -1 will disable this
# process, leaving existing index summaries at their current sampling level.
index_summary_resize_interval_in_minutes: 60

请参阅 CASSANDRA-6379 所以回答你的问题，大SSTable的读取性能：

如果偶然您在分区密钥缓存
因为大SSTable的索引间隔会增加，所以会慢一些（例如，SSTable有很多不同的分区键，它不一定与绝对大小有关）
如果不经常使用大SSTable会慢一些，请参阅 CASSANDRA-5519

由于SSTable很大，读取延迟会增加吗？

1 个答案: