我在python中有一个生成器函数,它以块的形式读取数据集并在循环中生成每个块。 在循环的每次迭代中,块大小相同并且数据数组被覆盖。
它开始时每隔约0.3秒产生一个块,并且在第70次迭代时减慢到每~3s。 这是发电机:
def yield_chunks(self):
# Loop over the chunks
for j in range(self.ny_chunks):
for i in range(self.nx_chunks):
dataset_no = 0
arr = numpy.zeros([self.chunk_size_y, self.chunk_size_x, nInputs], numpy.dtype(numpy.int32))
# Loop over the datasets we will read into a single 'chunk'
for peril in datasets.dataset_cache.iterkeys():
group = datasets.getDatasetGroup(peril)
for return_period, dataset in group:
dataset_no += 1
# Compute the window of the dataset that falls into this chunk
dataset_xoff, dataset_yoff, dataset_xsize, dataset_ysize = self.chunk_params(i, j)
# Read the data
data = dataset[0].ReadAsArray(dataset_xoff, dataset_yoff, dataset_xsize, dataset_ysize)
# Compute the window of our chunk array that this data fits into
chunk_xoff, chunk_yoff = self.window_params(dataset_xoff, dataset_yoff, dataset_xsize, dataset_ysize)
# Add the data to the chunk array
arr[chunk_yoff:(dataset_ysize+chunk_yoff), chunk_xoff:(dataset_xsize+chunk_xoff), dataset_no] = data
# Once we have added data from all datasets to the chunk array, yield it
yield arr
是否有可能在每个块之后没有正确释放内存,这导致循环变慢?还有其他原因吗?