Memory Management of Piece Table and Rope in Modern Text Editor

时间:2017-06-12 16:56:24

标签: java string memory data-structures

I know that we can use two stacks to implement Undo/Redo for text editors. For Piece Table, you can simply push the nodes that are going to be affected into the stack as mentioned here (Great write-up about Piece Table in general, btw). And for Rope, my understanding is that since Rope should be immutable, whenever there is a change, simply push the root of the old tree into stack as mentioned here:

"Not only can text insertions and deletions be performed in near-constant time for extremely large documents, but ropes' immutability makes implementation of an undo stack trivial: simply store a reference to the previous rope with every change."

If this is the case, then Rope seems very memory intensive, and can quickly fill up your memory with a large file after a couple of modifications. How is this handled in modern text editors?

This leads to another question: What would you do if there is a 5GB file and you only have 2GB memory? I was thinking maybe use paging or dynamic loading, so when you scroll down it will discard some old text in memory and load more from the disk. Then how is this realized in Piece Table and Rope? Maybe we could serialize older part of data structure onto disk as we load more content and put into our data structure, but this just does not seem to be an optimal solution to me.

Cheers!

1 个答案:

答案 0 :(得分:3)

不可变对象的一个​​主要优点是它们可以相互共享结构。因为结构永远不会改变,所以整个结构不需要复制;只需要复制受修改影响的部分。这意味着每次添加只需要相对少量的内存。以下是对另一种树状结构如何实现这一点的一个很好的解释:Understanding Clojure's Persistent Vector

即使使用内存保存优化,不适合RAM或程序允许的内存空间的非常大的文件仍然会造成问题。要解决此问题,文本编辑器将在某种类型的swap file中存储未直接编辑的文件部分。一些编辑器使用他们自己的虚拟内存实现来完成此任务,而其他编辑器只选择距离光标足够远的文件部分。作为(假设的)示例,如果在一段时间内没有对其进行任何修改,则可以将绳索的子树保存到文件中。