Question

我正在Numpy学习，我想了解如下这样的改组数据代码：

# x is a m*n np.array
# return a shuffled-rows array 
def shuffle_col_vals(x):
    rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
    grid = np.indices(x.shape)
    rand_y = grid[1]
    return x[(rand_x, rand_y)]

因此，我输入了一个np.array对象，如下所示：

x1 = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12],
                [13, 14, 15, 16]], dtype=int)

我得到shuffle_col_vals(x1)的输出，如下所示：

array([[ 1,  5, 11, 15],
       [ 3,  8,  9, 14],
       [ 4,  6, 12, 16],
       [ 2,  7, 10, 13]], dtype=int64)

我对rand_x的初始方式感到困惑，而在numpy.array中却没有这种方式
而且我已经考虑了很长时间，但是我仍然不明白为什么return x[(rand_x, rand_y)]会得到一个改组的行数组。
如果不介意，谁能向我解释代码？
预先感谢。

Answer 1

在索引Numpy数组时，可以采用单个元素。让我们使用3x4数组来区分轴：

In [1]: x1 = np.array([[1, 2, 3, 4],
   ...:                [5, 6, 7, 8],
   ...:                [9, 10, 11, 12]], dtype=int)

In [2]: x1[0, 0]
Out[2]: 1

如果查看Numpy Advanced indexing，您会发现，通过为每个维度提供列表，您可以在索引编制中做更多的事情。考虑使用x1[rows..., cols...]进行索引，让我们考虑两个元素。

从第一行和第二行中选择，但始终从第一列中选择：

In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])

您甚至可以使用数组建立索引：

In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
       [5, 6]])

np.indices创建一个行和列数组，如果用于索引，则返回原始数组：

In [5]: grid = np.indices(x1.shape)

In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True

现在，如果您按顺序对grid[0]的值进行混洗，但按原样保留grid[1]，然后将它们用于索引，则将得到一个数组，其中的各列的值均被混洗。 / p>

每个列索引向量为[0, 1, 2]。现在，代码会分别为每列分别重新排列这些列索引向量，并将它们一起堆叠到rand_x中，形成与x1相同的形状。

创建一个随机的列索引向量：

In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])

通过（伪代码）与[random-index-col-vec for cols in range(x1.shape[1])]进行堆叠，然后转置（.T）来进行堆叠。

为了更清楚一点，我们可以将i重写为col并使用column_stack代替np.array（[... for col]）。T：

In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
                     for col in range(x1.shape[1])]

In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]

In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
       [1, 0, 2, 0],
       [0, 1, 1, 1]])

In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10,  3, 12],
       [ 5,  2, 11,  4],
       [ 1,  6,  7,  8]])

注意细节：

您提供的示例输出与您提供的功能不同。它似乎已移调。
在示例代码中使用rand_x和rand_y习惯于x = column index，y = row index的约定时会造成混淆

Answer 2

查看输出：

import numpy as np


def shuffle_col_val(x):
    print("----------------------------\n   A    rand_x\n")
    f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
    print(f, "\nNow I transpose an array.")
    rand_x = np.array([f]).T
    print(rand_x)
    print("----------------------------\n    B    rand_y\n")
    print("Grid gives you two possibilities\n you choose second:")
    grid = np.indices(x.shape)
    print(format(grid))
    rand_y = grid[1]
    print("\n----------------------------\n  C  Our rand_x, rand_y:")
    print("\nThe order of values in the column CHANGE:\n  has random order\n{}".format(rand_x))
    print("\nThe order of values in the row NO CHANGE:\n  has normal order 0, 1, 2, 3\n{}".format(rand_y))
    return x[(rand_x, rand_y)]


x1 = np.array([[1, 2, 3, 4],
               [5, 6, 7, 8],
               [9, 10, 11, 12],
               [13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n  D   Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))

输出：

   A    rand_x
[2 3 0 1] 
Now I transpose an array.
[[2]
 [3]
 [0]
 [1]]
----------------------------
    B    rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
  [1 1 1 1]
  [2 2 2 2]
  [3 3 3 3]]

 [[0 1 2 3]
  [0 1 2 3]
  [0 1 2 3]
  [0 1 2 3]]]
----------------------------
  C  Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
 [3]
 [0]
 [1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]]
----------------------------
  D   Our shuffled-rows: 
[[ 9 10 11 12]
 [13 14 15 16]
 [ 1  2  3  4]
 [ 5  6  7  8]]

如何在Numpy中理解这样的改组数据代码

2 个答案: