I have a numpy structured array a
and create a view b
on it:
import numpy as np
a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','float']})
b = a[['A', 'C']]
The descr
component of the data type of b
indicates that the data are stored somehow "scattered".
>>> b.dtype.descr
[('A', '<i4'), ('', '|V4'), ('C', '<f8')]
(After reading the documentation I believe that the component ('', '|V4')
indicates a "gap" in the data, as b
is just a view on a
. )
If this bothers me, I can copy the data:
import numpy.lib.recfunctions as rf
c = rf.repack_fields(b)
and
>>> c.dtype.descr
[('A', '<i4'), ('C', '<f8')]
as desired.
This step requires me to copy the data. Now sometimes, I would like to apply an operation to the view. Often, these operations would return a copy of the array anyways. For example,
d = np.concatenate((b,b))
returns a copy of the data in b
and a
. Nonetheless,
>>> d.dtype.descr
[('A', '<i4'), ('', '|V4'), ('C', '<f8')]
indicates that the data are not stored efficiently.
So is there a way to work with views without producing "scattered" results? Would I always have to create a copy beforehand? Or is there no efficiency issue but only a weird way how descr
describes the data type? (If so, how can I avoid that?)
This question becomes particularly relevent, if I want to neglect intermediate steps:
d = np.concatenate((a[['A', 'C']], a[['A', 'C']]))
I am working with numpy 1.16 and python 3.7.
答案 0 :(得分:2)
Multifield indexing has been in a state of flux for sometime now. With 1.16
they seemed to have settled on this 'offset' form of 'views', requiring an explicit repacking if you want a 'clean' copy.
In [231]: np.__version__
Out[231]: '1.16.1'
In [232]: a.dtype
Out[232]: dtype([('A', '<i8'), ('B', '<i8'), ('C', '<f8')])
In [233]: a[['A','C']].dtype
Out[233]: dtype({'names':['A','C'], 'formats':['<i8','<f8'], 'offsets':[0,16], 'itemsize':24})
In this view, the values for 'B' are still present (at offset 8). Think of the databuffer as having:
[a0, b0, c0, a1, b1, c1, a2, b2, c2, ....]
The [233] 'view' looks at the same databuffer, but only gives us access to the A
and C
fields. repack_fields
creates a new databuffer with:
[a0, c0, a1, c1, ....]
If a
had been a regular (n,3)
array, a[:, [0,2]]
would be a copy. We could not skip a[:,1]
and still have a view.
In [234]: np.concatenate((a[['A','C']],a[['A','C']]))
Out[234]:
array([(0, 0.), (1, 1.), (2, 2.), (0, 0.), (1, 1.), (2, 2.)],
dtype={'names':['A','C'], 'formats':['<i8','<f8'], 'offsets':[0,16], 'itemsize':24})
Playing around with view
I find that the field at offset 8 (the 'B' field in a
) still exists, but is uninitialized (as in a np.empty
array).
Different ways of displaying this 'scattered' dtype:
In [238]: a1.dtype
Out[238]: dtype({'names':['A','C'], 'formats':['<i8','<f8'], 'offsets':[0,16], 'itemsize':24})
In [239]: a1.dtype.descr
Out[239]: [('A', '<i8'), ('', '|V8'), ('C', '<f8')]
In [241]: a1.dtype.fields
Out[241]: mappingproxy({'A': (dtype('int64'), 0), 'C': (dtype('float64'), 16)})
I can reorder the fields as well:
In [248]: a[['B','C','A']].dtype
Out[248]: dtype({'names':['B','C','A'], 'formats':['<i8','<f8','<i8'], 'offsets':[8,16,0], 'itemsize':24})
In [249]: a[['B','C','A']].dtype.descr
...
ValueError: dtype.descr is not defined for types with overlapping or out-of-order fields
答案 1 :(得分:0)
仅对于连接,您可以执行以下操作:
a = np.array([(1,2,3),(4,5,6)], 'f,f,f')
view = a[['f0','f2']]
b = np.empty(4, 'f,f')
b[:2] = view
b[2:] = view
print(b)
输出:
array([(1., 3.), (4., 6.), (1., 3.), (4., 6.)],
dtype=[('f0', '<f4'), ('f1', '<f4')])
编辑:忘了我对np.add
所说的话,它无论如何都不起作用