Question

我在读https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html并在进入分区密钥缓存时感到困惑。

我了解到，在Partition Key Cache中，您可以读取分区键的索引，然后使用索引在压缩偏移量映射中查找磁盘位置，因此无需执行Partition Summary步骤。

在稳定通过布隆过滤器后，它可能具有分区键。例如：sstable1具有pk1，pk2； keycache.pk1 = index0，keycache.pk2 = index0; compresset_offset_map.index0 = location0

以下是问题：

为什么keycache不能直接保存位置？因此您无需两次查找哈希表。
由于同一sstable中的每个分区键都具有相同的索引，所以为什么不使用set数据结构来查找而不是哈希表呢？

可能是我理解错了，并给出了错误的示例

我也不明白分区摘要的工作原理。

有人可以给我一个具体的例子吗？

谢谢

Answer 1

1. The concept of Partition key cache, is based on the fact that partition keys     are always stored in sorted order.
2. Partition summary is off-heap memory structure. and is based on sampling of partition keys which are present in partition index.    
3. Partition index file is stored on disk. and this file stores index of all partition keys along with their mapped offset.

假设分区索引文件具有50个分区键：从Key01-Key50。

如前所述，分区摘要基于采样，如果我们将分区摘要设置为每10个分区键采样一次，则分区摘要的内存将包含五个分区键的信息及其在分区索引内的磁盘位置。

For example, (This should be likely structure of Partition key cache)

Partition Key 01 -> location of PK01 from Partition index File
Partition Key 11 -> location of PK11 from Partition index File
Partition Key 21 -> location of PK21 from Partition index File
Partition Key 31 -> location of PK31 from Partition index File
Partition Key 41 -> location of PK41 from Partition index File

现在，如果查询包含对分区键33的请求，它将开始扫描内存中的分区键缓存，并有助于更快地进行扫描，它将知道存在分区键33的范围，即介于分区键31和分区键41

因此，根据摘要提供的信息，它直接转到索引文件中PK31的位置。然后，它将对分区键31和分区键41进行快速扫描。

简而言之，分区摘要可帮助我们快速查明分区索引在部件索引中的位置，而不是从分区索引文件中搜索每个分区键。

希望这会有所帮助。

Cassandra阅读流程示例

1 个答案: