Question

我正在尝试在Python中实现内存有效的Burrows-Wheeler转换。

给出一个字符串文本，我创建了一个文本索引列表（i，j，...），其中从i开始的循环字符串在字典上小于从j开始的循环字符串。

我想存储循环字符串起始位置的索引，因为存储所有循环字符串会占用太多内存：len(text) * len(text)

>>> text = "hellohowareyou$"
>>> ids = [i for i in range(len(text))]
>>> ids.sort(key=lambda i: text[i:] + text[:i])

>>> print(ids)
[14, 8, 1, 10, 0, 5, 2, 3, 4, 12, 6, 9, 13, 7, 11]

>>> print([text[i:] + text[:i] for i in ids])
['$hellohowareyou', 'areyou$hellohow', 'ellohowareyou$h', 'eyou$hellohowar',
 'hellohowareyou$', 'howareyou$hello', 'llohowareyou$he', 'lohowareyou$hel',
 'ohowareyou$hell', 'ou$hellohowarey', 'owareyou$helloh', 'reyou$hellohowa',
 'u$hellohowareyo', 'wareyou$helloho', 'you$hellohoware']

问题是，使用此代码，python在内存中创建了一个键列表，而我的内存不足以处理大文本。我希望python仅在每次比较时创建字符串，然后再次忘记它们以拥有2 * len(text)的内存使用量。

有什么建议吗？

使用键对内存进行有效排序

0 个答案: