我正在Numpy
学习,我想了解如下这样的改组数据代码:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
因此,我输入了一个np.array
对象,如下所示:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
我得到shuffle_col_vals(x1)
的输出,如下所示:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
我对rand_x
的初始方式感到困惑,而在numpy.array中却没有这种方式
而且我已经考虑了很长时间,但是我仍然不明白为什么return x[(rand_x, rand_y)]
会得到一个改组的行数组。
如果不介意,谁能向我解释代码?
预先感谢。
答案 0 :(得分:1)
在索引Numpy数组时,可以采用单个元素。让我们使用3x4数组来区分轴:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
如果查看Numpy Advanced indexing,您会发现,通过为每个维度提供列表,您可以在索引编制中做更多的事情。考虑使用x1[rows..., cols...]
进行索引,让我们考虑两个元素。
从第一行和第二行中选择,但始终从第一列中选择:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
您甚至可以使用数组建立索引:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices
创建一个行和列数组,如果用于索引,则返回原始数组:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
现在,如果您按顺序对grid[0]
的值进行混洗,但按原样保留grid[1]
,然后将它们用于索引,则将得到一个数组,其中的各列的值均被混洗。 / p>
每个列索引向量为[0, 1, 2]
。现在,代码会分别为每列分别重新排列这些列索引向量,并将它们一起堆叠到rand_x
中,形成与x1
相同的形状。
创建一个随机的列索引向量:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
通过(伪代码)与[random-index-col-vec for cols in range(x1.shape[1])]
进行堆叠,然后转置(.T
)来进行堆叠。
为了更清楚一点,我们可以将i重写为col
并使用column_stack
代替np.array([... for col])。T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
注意细节:
rand_x
和rand_y
习惯于x = column index,y = row index的约定时会造成混淆答案 1 :(得分:1)
查看输出:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
输出:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]