Cassandra - 读取请求中的磁盘寻道次数

时间:2017-03-20 17:24:39

标签: cassandra

我正在尝试了解Cassandra中读取操作所需的最大磁盘搜索次数。我查看了几篇在线文章,包括这篇文章:https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html

根据我的理解,在最坏的情况下需要两次磁盘搜索。一个用于读取分区索引,另一个用于从压缩分区读取实际数据。压缩分区中的数据索引是从压缩偏移表(存储在存储器中)中获得的。我在这里走在正确的轨道上吗?是否会出现需要超过1次磁盘搜索才能读取数据的情况?

1 个答案:

答案 0 :(得分:0)

我在这里发布了我从Cassandra用户社区线程收到的答案,以防其他人需要它:

youre right – one seek with hit in the partition key cache and two if not.
Thats the theory – but two thinge to mention:

First, you need two seeks per sstable not per entire read. So if you data is spread over multiple sstables on disk you obviously need more then two reads. Think of often updated partition keys – in combination with memory preassure you can easily end up with maaany sstables (ok they will be compacted some time in the future).

Second, there could be fragmentation on disk which leads to seeks during sequential reads.

Note: Each SSTable has it's own partition index.