子类化Numpy数组-传播属性

时间:2018-07-25 13:51:05

标签: python arrays numpy attributes subclass

我想知道numpy数组的自定义属性如何传播,即使该数组通过np.fromfunction之类的函数也是如此。

例如,我的课程ExampleTensor定义了一个属性attr,该属性默认情况下设置为1。

import numpy as np

class ExampleTensor(np.ndarray):
    def __new__(cls, input_array):
        return np.asarray(input_array).view(cls)

    def __array_finalize__(self, obj) -> None:
        if obj is None: return
        # This attribute should be maintained!
        self.attr = getattr(obj, 'attr', 1)

ExampleTensor实例之间的切片和基本操作将保留属性,但不会使用其他numpy函数(可能是因为它们创建了常规的numpy数组而不是ExampleTensors)。 我的问题:当从子类中构造出常规的numpy数组时,是否存在可以保留自定义属性的解决方案 numpy数组实例?

重现问题的示例:

ex1 = ExampleTensor([[3, 4],[5, 6]])
ex1.attr = "some val"

print(ex1[0].attr)    # correctly outputs "some val"
print((ex1+ex1).attr) # correctly outputs "some val"

np.sum([ex1, ex1], axis=0).attr # Attribute Error: 'numpy.ndarray' object has no attribute 'attr'

4 个答案:

答案 0 :(得分:2)

import numpy as np

class ExampleTensor(np.ndarray):
    def __new__(cls, input_array):
        return np.asarray(input_array).view(cls)

    def __array_finalize__(self, obj) -> None:
        if obj is None: return
        # This attribute should be maintained!
        default_attributes = {"attr": 1}
        self.__dict__.update(default_attributes)  # another way to set attributes

像这样实现 array_ufunc 方法

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):  # this method is called whenever you use a ufunc
        f = {
            "reduce": ufunc.reduce,
            "accumulate": ufunc.accumulate,
            "reduceat": ufunc.reduceat,
            "outer": ufunc.outer,
            "at": ufunc.at,
            "__call__": ufunc,
        }
        output = ExampleTensor(f[method](*(i.view(np.ndarray) for i in inputs), **kwargs))  # convert the inputs to np.ndarray to prevent recursion, call the function, then cast it back as ExampleTensor
        output.__dict__ = self.__dict__  # carry forward attributes
        return output

测试

x = ExampleTensor(np.array([1,2,3]))
x.attr = 2

y0 = np.add(x, x)
print(y0, y0.attr)
y1 = np.add.outer(x, x)
print(y1, y1.attr)  # works even if called with method

[2 4 6] 2
[[2 3 4]
 [3 4 5]
 [4 5 6]] 2

注释中的解释。

答案 1 :(得分:1)

这是一种尝试,它适用于非数组运算符,即使将子类指定为numpy ufunc的输出(注释中的解释)也是如此:

import numpy as np


class ArraySubclass(np.ndarray):
    '''Subclass of ndarray MUST be initialized with a numpy array as first argument.
    '''
    def __new__(cls, input_array, a=None, b=1):
        obj = np.asarray(input_array).view(cls)
        obj.a = a
        obj.b = b
        return obj

    def __array_finalize__(self, obj):
        if obj is None:  # __new__ handles instantiation
            return
        '''we essentially need to set all our attributes that are set in __new__ here again (including their default values). 
        Otherwise numpy's view-casting and new-from-template mechanisms would break our class.
        '''
        self.a = getattr(obj, 'a', None)
        self.b = getattr(obj, 'b', 1)

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):  # this method is called whenever you use a ufunc
        '''this implementation of __array_ufunc__ makes sure that all custom attributes are maintained when a ufunc operation is performed on our class.'''

        # convert inputs and outputs of class ArraySubclass to np.ndarray to prevent infinite recursion
        args = ((i.view(np.ndarray) if isinstance(i, ArraySubclass) else i) for i in inputs)
        outputs = kwargs.pop('out', None)
        if outputs:
            kwargs['out'] = tuple((o.view(np.ndarray) if isinstance(o, ArraySubclass) else o) for o in outputs)
        else:
            outputs = (None,) * ufunc.nout
        # call numpys implementation of __array_ufunc__
        results = super().__array_ufunc__(ufunc, method, *args, **kwargs)  # pylint: disable=no-member
        if results is NotImplemented:
            return NotImplemented
        if method == 'at':
            # method == 'at' means that the operation is performed in-place. Therefore, we are done.
            return
        # now we need to make sure that outputs that where specified with the 'out' argument are handled corectly:
        if ufunc.nout == 1:
            results = (results,)
        results = tuple((self._copy_attrs_to(result) if output is None else output)
                        for result, output in zip(results, outputs))
        return results[0] if len(results) == 1 else results

    def _copy_attrs_to(self, target):
        '''copies all attributes of self to the target object. target must be a (subclass of) ndarray'''
        target = target.view(ArraySubclass)
        try:
            target.__dict__.update(self.__dict__)
        except AttributeError:
            pass
        return target

以及相应的单元测试:

import unittest
class TestArraySubclass(unittest.TestCase):
    def setUp(self):
        self.shape = (10, 2, 5)
        self.subclass = ArraySubclass(np.zeros(self.shape))

    def test_instantiation(self):
        self.assertIsInstance(self.subclass, np.ndarray)
        self.assertIs(self.subclass.a, None)
        self.assertEqual(self.subclass.b, 1)
        self.assertEqual(self.subclass.shape, self.shape)
        self.assertTrue(np.array_equal(self.subclass, np.zeros(self.shape)))
        sub2 = micdata.arrayasubclass.ArraySubclass(np.zeros(self.shape), a=2)
        self.assertEqual(sub2.a, 2)

    def test_view_casting(self):
        self.assertIsInstance(np.zeros(self.shape).view(ArraySubclass),ArraySubclass)

    def test_new_from_template(self):
        self.subclass.a = 5
        bla = self.subclass[3, :]
        self.assertIsInstance(bla, ArraySubclass)
        self.assertIs(bla.a, 5)
        self.assertEqual(bla.b, 1)

    def test_np_min(self):
        self.assertEqual(np.min(self.subclass), 0)

    def test_ufuncs(self):
        self.subclass.b = 2
        self.subclass += 2
        self.assertTrue(np.all(self.subclass == 2))
        self.subclass = self.subclass + np.ones(self.shape)
        self.assertTrue(np.all(self.subclass == 3))
        np.multiply.at(self.subclass, slice(0, 2), 2)
        self.assertTrue(np.all(self.subclass[:2] == 6))
        self.assertTrue(np.all(self.subclass[2:] == 3))
        self.assertEqual(self.subclass.b, 2)

    def test_output(self):
        self.subclass.a = 3
        bla = np.ones(self.shape)
        bla *= 2
        np.multiply(bla, bla, out=self.subclass)
        self.assertTrue(np.all(self.subclass == 5))
        self.assertEqual(self.subclass.a, 3)

P.s。 tempname123几乎正确。但是,对于不是数组的运算符以及将其类指定为ufunc的输出时,他的答案将失败:

>>> ExampleTensor += 1
AttributeError: 'int' object has no attribute 'view'
>>> np.multiply(np.ones((5)), np.ones((5)), out=ExampleTensor)
RecursionError: maximum recursion depth exceeded in comparison

答案 2 :(得分:0)

我认为您的示例不正确:

>>> type(ex1)
<class '__main__.ExampleTensor'>

但是

>>> type([ex1, ex1])
<class 'numpy.ndarray'>

,由于实际上正在构建数组而不是子类,因此不会调用您的重载__new____array_finalize__。但是,如果您这样做,它们将被称为:

>>> ExampleTensor([ex1, ex1])

attr = 1设置为ExampleTensor,因为您尚未定义在ExampleTensor的列表中构建hive>CREATE EXTERNAL TABLE testingjson( agentId map <string,string> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION '/.../json_table'; hive> select * from testingjson; +-----------------------------+--+ | testingjson.agentid | +-----------------------------+--+ | {"string":"123456-123456"} | +-----------------------------+--+ hiv> select agentid['string'] from testingjson; +----------------+--+ | _c0 | +----------------+--+ | 123456-123456 | +----------------+--+ 时如何传播属性。您需要通过重载相关操作在子类中定义此行为。正如以上评论所建议的,值得一看code for np.matrix以获得灵感。

答案 3 :(得分:0)

如果ex1.attr != ex2.attrnp.sum([ex1, ex2], axis=0).attr,哪个值应该“传播”?

请注意,这个问题比最先出现的问题更为根本:各种各样的numpy函数如何才能自己找出您的意图? 您可能无法避免为每个这样的“ attr-aware”函数编写一个重载版本:

def sum(a, **kwargs):
    sa=np.sum(a, **kwargs)
    if isinstance(a[0],ExampleTensor): # or if hasattr(a[0],'attr')
        sa.attr=a[0].attr
    return sa

我确定这还不够通用,无法处理任何np.sum()输入,但可以用于您的示例。