如何使用较少的RAM模拟列表推导?

时间:2019-02-14 10:06:38

标签: python python-3.x list

有一个创建列表的函数,其中包含其他12个列表中元素的所有组合

def make_list(self):
    results = ['{} {} {} {} {} {} {} {} {} {} {} {}'.format(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12)
               for x1 in self.list1
                    for x2 in self.list2
                        for x3 in self.list3
                            for x4 in self.list4
                                for x5 in self.list5
                                    for x6 in self.list6
                                        for x7 in self.list7
                                            for x8 in self.list8
                                                for x9 in self.list9
                                                    for x10 in self.list10
                                                        for x11 in self.list11
                                                            for x12 in self.list12]

但是,它使用了太多的RAM,正如预期的那样。 有没有可以使用更少内存的解决方案? 我尝试使用map(),但我不太了解它可能是什么功能。

或者最好的解决方案是在C ++或Go上重写它?

2 个答案:

答案 0 :(得分:0)

问题在于,您正在尝试将所有可能的组合保留在内存中(因为您构建了一个字符串并返回一个整数。)在cartesean product的情况下,您将不得不产生(对于12清单和大约40个项目)约40²的组合。即使您列表中的每个元素仅占用一个字节,您也将需要约17 Exabytes并以文本形式保存它,而花费的时间要多几倍。

但是您可以产生所有这些组合而无需疯狂的内存需求。您可以使用滑动索引来打印所有可能性。

仅出于教育目的,我将给出一个示例解决方案,说明如何产生所有这些组合而又不消耗数万亿个可用于变量列表编号的RAM存储器:

from random import randint

# It easier to read code which words rather than with numbers
# These are more meaningful names for positions in indices list

INDEX = 0  # Index in the actual list we are taking for combination
LENGTH = 1  # Calculated length of the actual list
LIST = 2  # Reference to the actual list

AVERAGE_SIZE = 4
DEVIATION = 1


class Demo(object):
    list1 = list(randint(-100, 100) for _ in range(0, AVERAGE_SIZE + randint(-DEVIATION, +DEVIATION)))
    list2 = list(randint(-100, 100) for _ in range(0, AVERAGE_SIZE + randint(-DEVIATION, +DEVIATION)))
    list3 = list(randint(-100, 100) for _ in range(0, AVERAGE_SIZE + randint(-DEVIATION, +DEVIATION)))
    list4 = list(randint(-100, 100) for _ in range(0, AVERAGE_SIZE + randint(-DEVIATION, +DEVIATION)))

    def combinations(self):
        """Generator of all possible combinations. Yields one combination as comma separated string at a time"""
        indices = [
            # Actual list index start, length of the actual list, reference to that list
            [0, len(self.list1) - 1, self.list1],
            [0, len(self.list2) - 1, self.list2],
            [0, len(self.list3) - 1, self.list3],
            [0, len(self.list4) - 1, self.list4],
        ]

        total_positions = len(indices)

        # Calculate number of the the all possible combinations
        rounds = None
        for item in indices:
            if rounds is None:
                rounds = item[LENGTH] + 1
            else:
                rounds *= (item[LENGTH] + 1)

        current_round = 0  # Combination index

        while current_round < rounds:
            combination = list()
            carry_position = 0
            for current_position in range(0, total_positions):
                # Take a triple of index, length and list at current position (for easier readability)
                current_item = indices[current_position]

                # Pick current combination
                combination.append(current_item[LIST][current_item[INDEX]])

                # If current position under carry
                if current_position <= carry_position:
                    # Advance actual index unless it reached end of its list
                    if indices[current_position][INDEX] < indices[current_position][LENGTH]:
                        indices[current_position][INDEX] = indices[current_position][INDEX] + 1
                        carry_position = 0
                    else:  # If index of current list at the end
                        indices[current_position][INDEX] = 0  # Move actual list index to the start
                        carry_position = current_position + 1  # Remember that next poison have carry
                # Repeat for each position

            # Yield collected combination
            yield ','.join(str(x) for x in combination)

            current_round += 1

        raise StopIteration

    def print_combinations(self):
        """Prints all of the combinations"""
        for combination in self.combinations():
            print(combination)


if __name__ == '__main__':
    Demo().print_combinations()

此解决方案是通用的,您可以轻松添加更多列表,但请注意,其中AVERAGE_SIZE包含40和12个列表,即使将结果转储到文件中,您也需要超过10亿个存储的1 TB硬盘中。

答案 1 :(得分:0)

问题不是生成列表的方式,而是您首先需要大量列表的事实。取而代之的是,使用生成器延迟生成元组。 (您永远不会有时间实际遍历整个序列,但是从理论上讲是可行的。)

最简单的方法是使用itertools.product

import itertools

tuples = itertools.product(self.list1, self.list2, ..., self.list12)

# This will run virtually forever, but never use more memory than is needed for
# a single tuple
for t in tuples:
   print('{} {} {} {} {} {} {} {} {} {} {} {}'.format(*t))