Question

我有以下Python代码：

for i in range(0, len(oscillations) - sequence_length):
    a = patterns[i:i + sequence_length]
    b = oscillations[i:i + sequence_length]
    sequence_in = [a+b for a,b in zip(a,b)]
    sequence_out = oscillations[i + sequence_length]  
    network_input.append(sequence_in)
    network_output.append(sequence_out)

振荡的长度为212922。每个振荡元素的长度为25。图案的长度完全相同。这两个列表的结构相同，但是数据不同。

上面的代码失败，它给了我一个MemoryError。有时在循环中，有时在返回两个列表时。

如果我将列表缩短到大约100000个元素，那么它将起作用。

我了解这可能是我试图分配过多的内存，但是我的问题是，是否有一种更聪明的方法来遍历列表，从而无需分配那么多的内存。

Answer 1

正如一些评论者所指出的那样，您可能不需要构建整个列表network_input和network_output。内存消耗的最大改进将改为yield：

def stuff(oscillations, sequence_length, patterns):
    for i in range(0, len(oscillations) - sequence_length):
        a = patterns[i:i + sequence_length]
        b = oscillations[i:i + sequence_length]
        sequence_in = [a + b for a, b in zip(a, b)]
        sequence_out = oscillations[i + sequence_length]
        yield (sequence_in, sequence_out)

for s in stuff(oscillations, sequence_length, patterns):
    print(s)

通过多次切片和汇总两个集合中的相同元素，可以获得较小的改进。 a和b在每次迭代之间仅相差一个元素。您可以使用简单的移动和算法：

def moving_stuff(oscillations, sequence_length, patterns):
    ops = []
    sums = []
    for op in zip(oscillations, patterns):
        ops.append(op)
        if len(ops) > sequence_length:
            sequence_in = sums[:]
            sequence_out = op[0]
            yield (sequence_in, sequence_out)
            ops.pop(0)
            sums.pop(0)
        sums.append(sum(op))

有没有更好的方法可以在列表周围编写此循环，以免产生内存错误？

1 个答案: