在NumPy数组中概括切片操作

时间:2017-09-09 21:00:22

标签: python arrays numpy indexing

此问题基于this较早的问题:

  

给定一个数组:

In [122]: arr = np.array([[1, 3, 7], [4, 9, 8]]); arr
Out[122]: 
array([[1, 3, 7],
       [4, 9, 8]])
     

并给出其指数:

In [127]: np.indices(arr.shape)
Out[127]: 
array([[[0, 0, 0],
        [1, 1, 1]],

       [[0, 1, 2],
        [0, 1, 2]]])
     

我怎样才能将它们整齐地叠在一起形成   一个新的2D阵列?这就是我喜欢的:

array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])
Divakar的

This solution是我目前用于2D阵列的内容:

def indices_merged_arr(arr):
    m,n = arr.shape
    I,J = np.ogrid[:m,:n]
    out = np.empty((m,n,3), dtype=arr.dtype)
    out[...,0] = I
    out[...,1] = J
    out[...,2] = arr
    out.shape = (-1,3)
    return out

现在,如果我想传递一个3D数组,我需要修改这个函数:

def indices_merged_arr(arr):
    m,n,k = arr.shape   # here
    I,J,K = np.ogrid[:m,:n,:k]   # here
    out = np.empty((m,n,k,4), dtype=arr.dtype)   # here
    out[...,0] = I
    out[...,1] = J
    out[...,2] = K     # here
    out[...,3] = arr
    out.shape = (-1,4)   # here
    return out

但是此功能现在仅适用于3D阵列 - 我无法将2D数组传递给它。

我是否有某种方法可以将其概括为适用于任何维度?这是我的尝试:

def indices_merged_arr_general(arr):
    tup = arr.shape   
    idx = np.ogrid[????]   # not sure what to do here....
    out = np.empty(tup + (len(tup) + 1, ), dtype=arr.dtype) 
    for i, j in enumerate(idx):
        out[...,i] = j
    out[...,len(tup) - 1] = arr
    out.shape = (-1, len(tup)
    return out

我遇到这条线路的问题:

idx = np.ogrid[????]   

我怎样才能使这个工作?

4 个答案:

答案 0 :(得分:10)

这是处理通用ndarrays的扩展 -

def indices_merged_arr_generic(arr, arr_pos="last"):
    n = arr.ndim
    grid = np.ogrid[tuple(map(slice, arr.shape))]
    out = np.empty(arr.shape + (n+1,), dtype=np.result_type(arr.dtype, int))

    if arr_pos=="first":
        offset = 1
    elif arr_pos=="last":
        offset = 0
    else:
        raise Exception("Invalid arr_pos")        

    for i in range(n):
        out[...,i+offset] = grid[i]
    out[...,-1+offset] = arr
    out.shape = (-1,n+1)

    return out

示例运行

2D案例:

In [252]: arr
Out[252]: 
array([[37, 32, 73],
       [95, 80, 97]])

In [253]: indices_merged_arr_generic(arr)
Out[253]: 
array([[ 0,  0, 37],
       [ 0,  1, 32],
       [ 0,  2, 73],
       [ 1,  0, 95],
       [ 1,  1, 80],
       [ 1,  2, 97]])

In [254]: indices_merged_arr_generic(arr, arr_pos='first')
Out[254]: 
array([[37,  0,  0],
       [32,  0,  1],
       [73,  0,  2],
       [95,  1,  0],
       [80,  1,  1],
       [97,  1,  2]])

3D案例:

In [226]: arr
Out[226]: 
array([[[35, 45, 33],
        [48, 38, 20],
        [69, 31, 90]],

       [[73, 65, 73],
        [27, 51, 45],
        [89, 50, 74]]])

In [227]: indices_merged_arr_generic(arr)
Out[227]: 
array([[ 0,  0,  0, 35],
       [ 0,  0,  1, 45],
       [ 0,  0,  2, 33],
       [ 0,  1,  0, 48],
       [ 0,  1,  1, 38],
       [ 0,  1,  2, 20],
       [ 0,  2,  0, 69],
       [ 0,  2,  1, 31],
       [ 0,  2,  2, 90],
       [ 1,  0,  0, 73],
       [ 1,  0,  1, 65],
       [ 1,  0,  2, 73],
       [ 1,  1,  0, 27],
       [ 1,  1,  1, 51],
       [ 1,  1,  2, 45],
       [ 1,  2,  0, 89],
       [ 1,  2,  1, 50],
       [ 1,  2,  2, 74]])

答案 1 :(得分:6)

对于大型阵列,AFAIK,senderle's cartesian_product是使用NumPy生成笛卡尔积的最快方式 1

In [372]: A = np.random.random((100,100,100))

In [373]: %timeit indices_merged_arr_generic_using_cp(A)
100 loops, best of 3: 16.8 ms per loop

In [374]: %timeit indices_merged_arr_generic(A)
10 loops, best of 3: 28.9 ms per loop

以下是我用于基准测试的设置。 下面,indices_merged_arr_generic_using_cp是对senderle cartesian_product的修改,在扁平化数组旁边包含笛卡尔积:

import numpy as np
import functools

def indices_merged_arr_generic_using_cp(arr):
    """
    Based on cartesian_product
    http://stackoverflow.com/a/11146645/190597 (senderle)
    """
    shape = arr.shape
    arrays = [np.arange(s, dtype='int') for s in shape]
    broadcastable = np.ix_(*arrays)
    broadcasted = np.broadcast_arrays(*broadcastable)
    rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1
    out = np.empty(rows * cols, dtype=arr.dtype)
    start, end = 0, rows
    for a in broadcasted:
        out[start:end] = a.reshape(-1)
        start, end = end, end + rows
    out[start:] = arr.flatten()
    return out.reshape(cols, rows).T

def indices_merged_arr_generic(arr):
    """
    https://stackoverflow.com/a/46135084/190597 (Divakar)
    """
    n = arr.ndim
    grid = np.ogrid[tuple(map(slice, arr.shape))]
    out = np.empty(arr.shape + (n+1,), dtype=arr.dtype)
    for i in range(n):
        out[...,i] = grid[i]
    out[...,-1] = arr
    out.shape = (-1,n+1)
    return out

1 请注意,上面我实际使用了senderle的cartesian_product_transpose。对我来说,这是 最快的版本。对于其他人,包括senderle,cartesian_product是 更快。

答案 2 :(得分:4)

let vc = SheetViewController() // Note this is a view controller, not window controller presentViewControllerAsSheet(vc) 迭代元素,而不是其他解决方案中的维度。所以我不指望它能赢得速度测试。但这是一种使用它的方式

void func(unsigned *u)
{
   *u >>= 1;        // do the operation on the unsigned value pointed by u.
                    // u contains the address of the object (in this case, the unsigned)
}

在py3 unsigned a = 23; func(&a); // pass the pointer to the variable a. After the function executes, // a will be changed as the operation is done on it. 中,解包可以在元组中使用,因此嵌套的元组可以展平:

ndenumerate

它很好地概括了更多维度:

In [588]:  arr = np.array([[1, 3, 7], [4, 9, 8]])
In [589]: arr
Out[589]: 
array([[1, 3, 7],
       [4, 9, 8]])
In [590]: list(np.ndenumerate(arr))
Out[590]: [((0, 0), 1), ((0, 1), 3), ((0, 2), 7), ((1, 0), 4), ((1, 1), 9), ((1, 2), 8)]

使用这些小样本,它实际上比@Diakar的功能更快。 :)

*

但对于大型3D阵列来说,它要慢得多

In [591]: [(*ij,v) for ij,v in np.ndenumerate(arr)]
Out[591]: [(0, 0, 1), (0, 1, 3), (0, 2, 7), (1, 0, 4), (1, 1, 9), (1, 2, 8)]
In [592]: np.array(_)
Out[592]: 
array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

使用`@ unutbu's - 小的速度慢,大的速度更快:

In [593]: arr3 = np.arange(24).reshape(2,3,4)
In [594]: np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)])
Out[594]: 
array([[ 0,  0,  0,  0],
       [ 0,  0,  1,  1],
       [ 0,  0,  2,  2],
       [ 0,  0,  3,  3],
       [ 0,  1,  0,  4],
       [ 0,  1,  1,  5],
       ....
       [ 1,  2,  3, 23]])

答案 3 :(得分:0)

我们可以使用以下oneliner:

from numpy import hstack, array, meshgrid

hstack((
        array(meshgrid(*map(range, t.shape))).T.reshape(-1,t.ndim),
        t.flatten().reshape(-1,1)
       ))

这里我们首先使用map(range, t.shape)来构造range的迭代。通过使用np.meshgrid(..).T.reshape(-1, t.dim),我们构建表格的第一部分: n×m 矩阵,其中 n t的元素数量,以及< em> m 维度数量,然后我们在右侧添加展平版t