Question

这可能很简单，但我还没能在网上找到解决方案...... 我尝试使用存储为netcdf文件的一系列数据集。我打开每一个，读入一些关键点，然后移动到下一个文件。我发现我经常遇到mmap错误/脚本在读入更多文件时变慢。我相信这可能是因为netcdf文件没有被.close（）命令正确关闭。

我一直在测试这个：

from scipy.io.netcdf import netcdf_file as ncfile
f=ncfile(netcdf_file,mode='r')
f.close()

然后如果我尝试

>>>f
<scipy.io.netcdf.netcdf_file object at 0x24d29e10>

和

>>>f.variables['temperature'][:]
array([ 1234.68034431,  1387.43136567,  1528.35794546, ...,  3393.91061952,
    3378.2844357 ,  3433.06715226])

所以看来文件仍然打开？ close（）实际上做了什么？我怎么知道它有效？有没有办法从python中关闭/清除所有打开的文件？

软件： Python 2.7.6，scipy 0.13.2，netcdf 4.0.1

Answer 1

f.close的代码是：

Definition: f.close(self)
Source:
    def close(self):
        """Closes the NetCDF file."""
        if not self.fp.closed:
            try:
                self.flush()
            finally:
                self.fp.close()

f.fp是文件对象。所以

In [451]: f.fp
Out[451]: <open file 'test.cdf', mode 'wb' at 0x939df40>

In [452]: f.close()

In [453]: f.fp
Out[453]: <closed file 'test.cdf', mode 'wb' at 0x939df40>

但是我在玩f时看到，我仍然可以创建尺寸和变量。但f.flush()会返回错误。

在数据写入期间，只是在读取期间，它看起来不像mmap。

def _read_var_array(self):
            ....
            if self.use_mmap:
                mm = mmap(self.fp.fileno(), begin_+a_size, access=ACCESS_READ)
                data = ndarray.__new__(ndarray, shape, dtype=dtype_,
                        buffer=mm, offset=begin_, order=0)
            else:
                pos = self.fp.tell()
                self.fp.seek(begin_)
                data = fromstring(self.fp.read(a_size), dtype=dtype_)
                data.shape = shape
                self.fp.seek(pos)

我对mmap没有多少经验。看起来它基于文件中的字节块设置mmap对象，并将其用作变量的数据缓冲区。如果基础文件已关闭，我不知道该访问会发生什么。如果出现某种mmap错误，我不会感到惊讶。

如果使用mmap=False打开文件，那么整个变量将被读入内存，并像常规numpy数组一样被访问。

mmap : None or bool, optional
    Whether to mmap `filename` when reading.  Default is True
    when `filename` is a file name, False when `filename` is a
    file-like object

我的猜测是，如果您在未指定mmap模式的情况下打开文件，请从中读取变量，然后关闭该文件，以后稍后引用该变量及其数据是不安全的。任何需要加载更多数据的引用都可能导致mmap错误。

但是如果用mmap=False打开文件，即使关闭文件，也应该能够对变量进行切片。

我不知道一个文件或变量的mmap如何干扰对其他文件和变量的访问。但是我必须在mmap上阅读更多内容以确保这一点。

来自netcdf文档：

请注意，当netcdf_file用于打开mmap = True的文件（默认为只读）时，它返回的数组直接引用磁盘上的数据。该文件不应该被关闭，并且在被询问时，如果此类数组处于活动状态，则无法完全关闭。如果要在文件关闭后处理它们，您可能希望复制从mmapped Netcdf文件中获取的数据数组，请参阅下面的示例。

如何确保在python中关闭netcdf文件？

1 个答案: