Question

I have a numpy structured array a and create a view b on it:

import numpy as np
a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','float']})
b = a[['A', 'C']]

The descr component of the data type of b indicates that the data are stored somehow "scattered".

>>> b.dtype.descr
[('A', '<i4'), ('', '|V4'), ('C', '<f8')]

(After reading the documentation I believe that the component ('', '|V4') indicates a "gap" in the data, as b is just a view on a. )

If this bothers me, I can copy the data:

import numpy.lib.recfunctions as rf
c = rf.repack_fields(b)

and

>>> c.dtype.descr
[('A', '<i4'), ('C', '<f8')]

as desired.

This step requires me to copy the data. Now sometimes, I would like to apply an operation to the view. Often, these operations would return a copy of the array anyways. For example,

d = np.concatenate((b,b))

returns a copy of the data in b and a. Nonetheless,

>>> d.dtype.descr
[('A', '<i4'), ('', '|V4'), ('C', '<f8')]

indicates that the data are not stored efficiently.

So is there a way to work with views without producing "scattered" results? Would I always have to create a copy beforehand? Or is there no efficiency issue but only a weird way how descr describes the data type? (If so, how can I avoid that?)

This question becomes particularly relevent, if I want to neglect intermediate steps:

d = np.concatenate((a[['A', 'C']], a[['A', 'C']]))

I am working with numpy 1.16 and python 3.7.

Answer 1

Multifield indexing has been in a state of flux for sometime now. With 1.16 they seemed to have settled on this 'offset' form of 'views', requiring an explicit repacking if you want a 'clean' copy.

In [231]: np.__version__                                                             
Out[231]: '1.16.1'
In [232]: a.dtype                                                                    
Out[232]: dtype([('A', '<i8'), ('B', '<i8'), ('C', '<f8')])
In [233]: a[['A','C']].dtype                                                         
Out[233]: dtype({'names':['A','C'], 'formats':['<i8','<f8'], 'offsets':[0,16], 'itemsize':24})

In this view, the values for 'B' are still present (at offset 8). Think of the databuffer as having:

[a0, b0, c0, a1, b1, c1, a2, b2, c2, ....]

The [233] 'view' looks at the same databuffer, but only gives us access to the A and C fields. repack_fields creates a new databuffer with:

[a0, c0, a1, c1, ....]

If a had been a regular (n,3) array, a[:, [0,2]] would be a copy. We could not skip a[:,1] and still have a view.

In [234]: np.concatenate((a[['A','C']],a[['A','C']]))                                
Out[234]: 
array([(0, 0.), (1, 1.), (2, 2.), (0, 0.), (1, 1.), (2, 2.)],
      dtype={'names':['A','C'], 'formats':['<i8','<f8'], 'offsets':[0,16], 'itemsize':24})

Playing around with view I find that the field at offset 8 (the 'B' field in a) still exists, but is uninitialized (as in a np.empty array).

Different ways of displaying this 'scattered' dtype:

In [238]: a1.dtype                                                                   
Out[238]: dtype({'names':['A','C'], 'formats':['<i8','<f8'], 'offsets':[0,16], 'itemsize':24})

In [239]: a1.dtype.descr                                                             
Out[239]: [('A', '<i8'), ('', '|V8'), ('C', '<f8')]

In [241]: a1.dtype.fields                                                            
Out[241]: mappingproxy({'A': (dtype('int64'), 0), 'C': (dtype('float64'), 16)})

I can reorder the fields as well:

In [248]: a[['B','C','A']].dtype                                                     
Out[248]: dtype({'names':['B','C','A'], 'formats':['<i8','<f8','<i8'], 'offsets':[8,16,0], 'itemsize':24})
In [249]: a[['B','C','A']].dtype.descr                                               
...
ValueError: dtype.descr is not defined for types with overlapping or out-of-order fields

Answer 2

仅对于连接，您可以执行以下操作：

a     = np.array([(1,2,3),(4,5,6)], 'f,f,f')
view  = a[['f0','f2']]

b     = np.empty(4, 'f,f')
b[:2] = view
b[2:] = view

print(b)

输出：

array([(1., 3.), (4., 6.), (1., 3.), (4., 6.)],
      dtype=[('f0', '<f4'), ('f1', '<f4')])

编辑：忘了我对np.add所说的话，它无论如何都不起作用

Structured arrays: Do operations on views result in scattered arrays?

2 个答案: