Question

我有一个大约8GB的netCDF文件，其中包含五个3D变量（时间，纬度，经度），因此每个变量的大小约为1.6GB。我想以尽可能简单有效的方式为这些变量创建时间序列。

下面的代码可以完成我想做的事情，并且对于较小的netCDF文件也可以正常工作：

import netCDF4
import numpy as np

# Create dataset variable.
nc = netCDF4.Dataset(r"K:\Products\AMAZONAS.nc")

# Select a variable called "P".
nc_variable = nc["P"]

# Figure out which dimension represents "time" and which don't.
dims = nc_variable.dimensions
dims_index = tuple([dims.index(x) for x in dims if x != 'time'])

# Calculate the time-series, by taking the mean over the non-"time" dimensions.
ts = np.nanmean(nc_variable[...], axis = dims_index)

但是，对于上述8GB文件，nc_variable[...]部分导致了MemoryError。

所以我的问题是，什么是解决此错误的好方法？我意识到我可以创建一个for循环，每次计算整个时间序列的一小部分，然后将它们缝合在一起。但是我想知道是否有更多干净的解决方案（也许包括的软件包与此处使用的软件包不同）？

从存储为netCDF的大型3D数据集计算时间序列时处理内存问题

0 个答案: