Question

为什么这样做有效：

>>> f = np.array(([[10,20],[11,21],[11,21],[12,22],[13,23]]))
>>> f
array([[10, 20],
   [11, 21],
   [11, 21],
   [12, 22],
   [13, 23]])
>>> f.view([('',f.dtype)]*f.shape[1])
array([[(10, 20)],
   [(11, 21)],
   [(11, 21)],
   [(12, 22)],
   [(13, 23)]], 
  dtype=[('f0', '<i8'), ('f1', '<i8')])

但这不是：

>>> f = np.array(([10,11,11,12,13],[20,21,21,22,23])).T
>>> f
array([[10, 20],
   [11, 21],
   [11, 21],
   [12, 22],
   [13, 23]])
>>>  f.view([('',f.dtype)]*f.shape[1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: new type not compatible with array.

Answer 1

默认情况下，您的numpy数组存储在row major order中单个连续块中的内存中。定义结构化数组时，所有字段在内存中也必须是连续的。在您的情况下，您需要将每一行存储在内存中的连续位置。当你转置数组时，不是改变数据，而是只改变步幅，这意味着现在它是存储在内存中连续位置的列。

虽然它可能需要复制数据，这很慢，但在进行struct array magic之前，安全的方法就是调用np.ascontiguousarray：

>>> f = np.array([[10,11,11,12,13],[20,21,21,22,23]]).T
>>> f = np.ascontiguousarray(f)
>>> f.view([('',f.dtype)]*f.shape[1])
array([[(10, 20)],
       [(11, 21)],
       [(11, 21)],
       [(12, 22)],
       [(13, 23)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

Answer 2

这是一个内存布局问题：

>>> f = np.array(([[10,20],[11,21],[11,21],[12,22],[13,23]]))
>>> f.flags.c_contiguous
True
>>> f = np.array(([10,11,11,12,13],[20,21,21,22,23])).T
>>> f.flags.c_contiguous
False
>>> f.view([('',f.dtype)]*f.shape[0])
array([[(10, 11, 11, 12, 13), (20, 21, 21, 22, 23)]], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<i8')])

如果您喜欢，可以修复为

>>> f = np.array(([10,11,11,12,13],[20,21,21,22,23]), order='F').T
>>> f.flags.c_contiguous
True
>>> f.view([('',f.dtype)]*f.shape[1])
array([[(10, 20)],
       [(11, 21)],
       [(11, 21)],
       [(12, 22)],
       [(13, 23)]], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])

但这种观点的用处是什么？

视图不适用于转置数组

2 个答案: