Question

我正在开发一个应用程序，我已经定义了一个＆＃34;变量＆＃34;包含numpy数组形式的数据的对象。这些变量链接到（netcdf）数据文件，我想在需要时动态加载变量值，而不是在开始时加载有时很大的文件中的所有数据。

以下代码段演示了原理并且效果很好，包括使用切片访问数据部分。例如，您可以写：

a = var()   # empty variable
print a.values[7]   # values have been automatically "loaded"

甚至：

a = var()
a[7] = 0

但是，这段代码仍然迫使我立刻加载整个变量数据。 Netcdf（使用netCDF4库）允许我直接访问文件中的数据切片。例如：

f = netCDF4.Dataset(filename, "r")
print f.variables["a"][7]

我不能直接使用netcdf变量对象，因为我的应用程序绑定到一个无法记住netcdf文件处理程序的Web服务，也因为变量数据不总是来自netcdf文件，但可能来自其他来源，如OGC网络服务。

有没有办法去捕捉＆＃34;属性或setter方法中的切片表达式并使用它们？想法就是写下这样的东西：

    @property
    def values(self):
        if self._values is None:
            self._values = np.arange(10.)[slice]  # load from file ...
        return self._values

而不是下面的代码。

工作演示：

import numpy as np

class var(object):

    def __init__(self, values=None, metadata=None):
        if values is None:
            self._values = None
        else:
            self._values = np.array(values)
        self.metadata = metadata  # just to demonstrate that var has mor than just values

    @property
    def values(self):
        if self._values is None:
            self._values = np.arange(10.)  # load from file ...
        return self._values

    @values.setter
    def values(self, values):
        self._values = values

首先想到：我是否应该将值创建为单独的类，然后使用__getitem__？见In python, how do I create two index slicing for my own matrix class?

Answer 1

不，您无法检测从.values返回后对该对象执行的操作。结果可以存储在一个变量中，并且只能（很久以后）被切片，或者在不同的地方切片，或者整体使用，等等。

你确实应该返回一个包装器对象并挂钩到object.__getitem__;它可以让你根据需要检测切片和加载数据。切片时，Python传入slice() object。

Answer 2

感谢Martijn Pieters的指导和更多阅读，我想出了以下代码作为演示。请注意，Reader类使用netcdf文件和netCDF4库。如果你想自己尝试这个代码，你需要一个带变量的netcdf文件＆＃34; a＆＃34;和＆＃34; b＆＃34;，或者将Reader替换为将从数据数组中返回数据数组或切片的其他内容。

此解决方案定义了三个类：Reader执行实际的文件I / O处理，如果没有数据存储在内存中，则值管理数据访问部分并调用Reader实例，var是最终的＆＃34;变量＆＃ 34;在现实生活中将包含更多的元数据。该代码包含一些用于教育目的的额外打印语句。

"""Implementation of a dynamic variable class which can read data from file when needed or
return the data values from memory if they were read already. This concepts supports
slicing for both memory and file access.""" 

import numpy as np
import netCDF4 as nc

FILENAME = r"C:\Users\m.schultz\Downloads\data\tmp\MACC_20141224_0001.nc"
VARNAME = "a"


class Reader(object):
    """Implements the actual data access to variable values. Here reading a
    slice from a netcdf file.
    """

    def __init__(self, filename, varname):
        """Final implementation will also have to take groups into account...
        """
        self.filename = filename
        self.varname = varname

    def read(self, args=slice(None, None, None)):
        """Read a data slice. Args is a tuple of slice objects (e.g.
        numpy.index_exp). The default corresponds to [:], i.e. all data
        will be read.
        """
        with nc.Dataset(self.filename, "r") as f:
            values = f.variables[self.varname][args]
        return values


class Values(object):

    def __init__(self, values=None, reader=None):
        """Initialize Values. You can either pass numerical (or other) values,
        preferrably as numpy array, or a reader instance which will read the
        values on demand. The reader must have a read(args) method, where
        args is a tuple of slices. If no args are given, all data should be
        returned.
        """
        if values is not None:
            self._values = np.array(values)
        self.reader = reader

    def __getattr__(self, name):
        """This is only be called if attribute name is not present.
        Here, the only attribute we care about is _values.
        Self.reader should always be defined.
        This method is necessary to allow access to variable.values without
        a slicing index. If only __getitem__ were defined, one would always
        have to write variable.values[:] in order to make sure that something
        is returned.
        """
        print ">>> in __getattr__, trying to access ", name
        if name == "_values":
            print ">>> calling reader and reading all values..."
            self._values = self.reader.read()
        return self._values

    def __getitem__(self, args):
        print "in __getitem__"
        if not "_values" in self.__dict__:
            values = self.reader.read(args)
            print ">>> read from file. Shape = ", values.shape
            if args == slice(None, None, None):
                self._values = values  # all data read, store in memory
            return values
        else:
            print ">>> read from memory. Shape = ", self._values[args].shape
            return self._values[args]

    def __repr__(self):
        return self._values.__repr__()

    def __str__(self):
        return self._values.__str__()


class var(object):

    def __init__(self, name=VARNAME, filename=FILENAME, values=None):
        self.name = name
        self.values = Values(values, Reader(filename, name))


if __name__ == "__main__":
    # define a variable and access all data first
    # this will read the entire array and save it in memory, so that
    # subsequent access with or without index returns data from memory
    a = var("a", filename=FILENAME)
    print "1: a.values = ", a.values
    print "2: a.values[-1] = ", a.values[-1]
    print "3: a.values = ", a.values
    # define a second variable, where we access a data slice first
    # In this case the Reader only reads the slice and no data are stored
    # in memory. The second access indexes the complete array, so Reader
    # will read everything and the data will be stored in memory.
    # The last access will then use the data from memory.
    b = var("b", filename=FILENAME)
    print "4: b.values[0:3] = ", b.values[0:3]
    print "5: b.values[:] = ", b.values[:]
    print "6: b.values[5:8] = ",b.values[5:8]

我可以＆＃34;检测＆＃34; python类方法中的切片表达式？

2 个答案: