Recently I've been studying common indexing structures in databases, such as B+-trees and LSM. I have a solid handle on how point reads/writes/deletes/compaction would work in an LSM.
For example (in RocksDB/levelDB), on a point query read we would first check an in-memory index (memtable), followed by some amount of SST files starting from most to least recent. On each level in the LSM we would use binary search to help speed up finding each SST file for the given key. For a given SST file, we can use bloom filters to quickly check if the key exists, saving us further time.
What I don't see is how a range read specifically works. Does the LSM have to open an iterator on every SST level (including the memtable), and iterate in lockstep across all levels, to return a final sorted result? Is it implemented as just a series of point queries (almost definitely not). Are all potential keys pulled first and then sorted afterwards? Would appreciate any insight someone has here.
I haven't been able to find much documentation on the subject, any insight would be helpful here.
答案 0 :(得分:1)
RocksDB具有各种迭代器实现,例如Memtable迭代器,文件迭代器,合并迭代器等。
在范围读取期间,迭代器将使用SeekTo()
调用来查找类似于点查找(在SST中使用Binary搜索)的起始范围。在寻求起始范围之后,将有一系列迭代器为每个内存表创建一个,为每个Level-0文件创建一个(由于L0中SST的重叠性质),随后为每个级别创建一个。合并的迭代器将从每个迭代器中收集密钥,并按排序顺序提供数据,直到达到End范围为止。
有关迭代器实现的信息,请参见this文档。