Question

有没有一种很好的方法来调整numpy结构化dtypes的大小而不必每次都重建整个dtype？我目前有一个机制来做到这一点，但我很好奇，如果有一个更有效的方式。为了说明我的要求，我提供了一些代码。基本思想是我想调整特定名称的数据类型。

def resize_dtype(orig_type, resized_type): 
    # I have a bunch of logic in here to loop and build new_array
    return new_dtype

def test_resize_dtype():
    type1 = np.dtype({'names':['col1', 'col2'], 'formats':['2i' ,'4i']})
    type2 = np.dtype({'names':['col', 'descr'], 'formats':[5*type1, 'S32']})

    tmp = np.dtype({'names':['col1', 'col2'], 'formats':['10i' ,'4i']})
    desired_type = np.dtype({'names':['col', 'descr'], 'formats':[5*tmp, 'S32']})
    resized_type = np.dtype([('col1', '10i')])
    new_type = resize_dtype(type2, resized_type)
    assert new_type == desired_type

Answer 1

那不公平。您显示测试框架，但没有实际执行大小调整的代码。乍一看test_resize_dtype看起来像是专注于修改np.dtype个对象。但是在定义了一堆dtypes之后，它所做的就是调用一个未知的resize_array - 或者这是一个错字，你真的是要将几乎称为未知的resize_dtype函数？

但我认为你的主要目标是将嵌套dtype col1的内部数组大小从(4,)更改为(10,)。

据我所知，没有一种改变这种阵列的简洁或有效的方法。您只需要使用新的dtype创建一个新数组，并逐个字段地复制数据，从旧到新，并根据需要调整形状。

X.dtype定义numpy如何查看数组X的每个元素。 reshape和transpose会影响它处理X元素的方式，但不会对这些元素进行内部处理。您对dtype的更改不仅会更改元素中字节的解释，还会更改其大小。因此，无法重用原始的X数据缓冲区。

numpy.lib.recfunctions有一堆用于处理rec数组和结构化数组的直接函数，包括添加字段等内容。我已经检查过的那些字段从旧字段到新字段逐字段地复制数据 - 并且如果需要的话递归地沿着嵌套的dtypes工作。但你的大小调整可能超出了它的能力。

In [92]: X1=np.zeros(1,dtype=type2)

In [93]: X2=np.zeros(1,dtype=desired_type)

In [94]: X1.itemsize
Out[94]: 152

In [95]: X2.itemsize
Out[95]: 312

对于更改现有dtype，您可以修改其descr，然后创建新的dtype。主要的复杂因素是descr是列表和元组的混合。列表是可变的，元组不是。但这是一个示例会议：

字段名称，至少在顶层，可以直接更改：

In [141]: type2.names=['column','description']
In [142]: type2
Out[142]: dtype([('column', [('col1', '<i4', (2,)), ('col2', '<i4', (4,))], (5,)), ('description', 'S32')])

抓住descr，dtype的列表代表：

In [164]: d2=type2.descr    
In [165]: d2
Out[165]: 
[('column', [('col1', '<i4', (2,)), ('col2', '<i4', (4,))], (5,)),
 ('description', '|S32')]

...

d2有足够的信息来重新创建dtype：np.dtype(d2)。

d2是一个元组列表;修改我需要将其转换为列表的任何一个：

In [168]: dd2=list(d2[0])
In [169]: dd2
Out[169]: ['column', [('col1', '<i4', (2,)), ('col2', '<i4', (4,))], (5,)]

嵌入dd2是我们想要改变的另一个元组：

In [174]: ddd2=list(dd2[1][0])
In [175]: ddd2
Out[175]: ['col1', '<i4', (2,)]

In [176]: ddd2[2]=(10,)    # change the list    
In [177]: ddd2
Out[177]: ['col1', '<i4', (10,)]

将此列表（转换回元组）写入dd2：

In [181]: dd2[1][0]=tuple(ddd2)    
In [182]: dd2
Out[182]: ['column', [('col1', '<i4', (10,)), ('col2', '<i4', (4,))], (5,)]

瞧，我最后也改变了d2。这是因为我在[181]中更改的内容是嵌套在元组中的列表。我可能并不需要首先列出dd2列表。我只是用它来引用内部可变列表。

我现在可以使用d2制作与desired_dtype匹配的dtype。

In [183]: d2
Out[183]: 
[('column', [('col1', '<i4', (10,)), ('col2', '<i4', (4,))], (5,)),
 ('description', '|S32')]

In [184]: np.dtype(d2)
Out[184]: dtype([('column', [('col1', '<i4', (10,)), ('col2', '<i4', (4,))], (5,)), ('description', 'S32')])

In [185]: desired_type
Out[185]: dtype([('col', [('col1', '<i4', (10,)), ('col2', '<i4', (4,))], (5,)), ('descr', 'S32')])

对不起，如果这有点长，但我认为探索过程比最终结果更重要。

为结构化数据调整Numpy Dtype大小的最佳方法

1 个答案: