Question

我正在研究处理传入数据的框架。

从套接字接收数据并使用移位将数据A（用作缓冲区）添加到numpy中，如：

execute()

该框架允许将处理单元作为可以使用指向A的数组视图访问传入数据的类加载。每次接收新数据并将其存储在A中时，调用方法def execute(self,): newSample = self.data[-1]：

index = -1

重要的是新样本总是在__init__下。用户还可以在def __init__(self,): self.myData = self.data[-4:] # view that contains last 4 samples函数中创建自己的数组视图：

self.data[-1]

当我移动数组A并在最后添加新值时，一切都很好用。但是，对于离线测试，我想在框架的开头加载所有数据，并像以前一样运行其他所有数据（即实现数据处理的相同类）。当然，我可以再次使用零数组创建A缓冲区并使用新值进行移位。但是，这涉及在两个阵列之间复制绝对不必要的数据 - 需要时间和内存。

我在想的是提供一种方法来改变numpy数组的边界或更改A.data指针。但是，不允许所有解决方案或导致警告消息。

最后，我正在尝试更改阵列A的内部偏移量，以便我可以推进它，从而为算法提供更多数据。重要的是，class MyArrayView(np.ndarray): def __new__(cls, input_array): obj = np.asarray(input_array).view(cls) # add the new attribute to the created instance obj._offset = 0 # Finally, we must return the newly created object: return obj def __array_finalize__(self, obj): if obj is None: return self._offset = getattr(obj, '_offset', None) def advance_index(self): self._offset += 1 def __str__(self): return super(MyArrayView, self[:]).__str__() def __repr__(self): return super(MyArrayView, self[:]).__repr__() def __getitem__(self, idx): if isinstance(idx, slice): start = 0 stop = self._offset step = idx.step idx = slice(start, stop, step) else: idx = self._offset + idx return super(MyArrayView, self).__getitem__(idx)必须始终指向新出现的样本，并且应该使用标准的numpy数组API。

我已经将np.ndarray子类化了：

a = np.array([1,2,3,4,5,6,7,8,9,10])
myA = MyArrayView(a)
b = myA
print("b :", b)
for i in range(1,5):
    myA.advance_index()
    print(b[:], b[-1])

print("b :", b)
print("b + 10 :", b + 10)
print("b[:] + 20 :", b[:] + 20)

允许我执行以下操作：

b : []
[1] 1
[1 2] 2
[1 2 3] 3
[1 2 3 4] 4
b : [1 2 3 4]
b + 10 : [11 12 13 14]
b[:] + 20 : [21 22 23 24]

并提供以下输出：

print("shape", b[:].shape)  # shape (4,)
print("shape", b.shape)     # shape (10,)

到目前为止一切顺利。但是如果我检查一下形状：

shape=(self.internalIndex,)

这两种情况不同。我尝试使用a = np.array([1,2,3,4,5,6,7,8,9,10]) b = a.view()[5:] print(a.data) # <memory at 0x7f09e01d8f48> print(b.data) # <memory at 0x7f09e01d8f48> They point to the same memory start! print(np.byte_bounds(a)) # (50237824, 50237904) print(np.byte_bounds(b)) # (50237864, 50237904) but the byte_bounds are different进行更改，但它只会导致错误消息。

我想问你是否认为这是我正在做的正确的方式，它只需要在np.ndarray类中重载更多的函数。或者我应该完全抛弃这个解决方案并使用新样本回退到数组？或者是否可以使用标准的np.ndarray实现来实现，因为我需要使用标准的numpy API。

我也试过这个：

考虑到这一点，我想说我需要创建一个数组a的视图并对其进行扩展（或者至少将其移动到{{1}}之上的窗口）。但是，我尝试更改byte_bounds并没有带来任何影响。

Answer 1

我很佩服你的勇敢，但我很确定numpy数组的子类化对你的问题来说太过分了，并且会给你带来很大的麻烦。最终，它可能会导致性能下降，远远超过您试图避免的阵列复制。

为什么不将切片（即[-4:]或slice(-4, None)）作为参数添加到__init__函数或类属性中并覆盖测试中的参数？

def __init__(self, lastfour=slice(-4, None)):
    self.myData = self.data[lastfour]

python numpy ndarray子类化用于偏移量变化

1 个答案: