从数组

时间:2017-04-17 23:28:45

标签: python numpy machine-learning

我写了以下函数:

def searchPosotive (X,y, num):
    pos = sample(list(compress(X, y)), num)
    return (pos)

此函数包含两个numpy矩阵Xy。这两个数组是相关的,即。 X[i]y[i]的标签。标签是1或0。

此函数从X中随机选择等效num值等于1的y值并返回(num, n)数组,其中n是X中的列数。

我需要获取它包含的索引值列表。例如,如果pos[a] == X[a]a需要在该列表中。我怎样才能做到这一点?

当我在寻找负面例子时,我也需要这样做。我使用的当前功能是:

def searchNegative (X,y, num):
    mat=X[y==0]
    rows = np.random.choice(len(mat), size=num,replace=False)
    mat=mat[rows,:]
    return (mat)

1 个答案:

答案 0 :(得分:3)

您想使用np.where来获取正(或负)Y的指数。然后,来自索引的样本。这是一个积极的函数,您可以修改它以让您选择正面或负面,或者只为负面编写另一个函数: 首先,假设:

>>> y
array([1, 0, 1, 1, 1, 0, 0, 1, 0, 1])
>>> X
array([[-25,  62,  94,  70,  96,  70,  38, -18, -57,   1],
       [ 40,  86, -98, -48,  40,  29,   4, -83,  44, -12],
       [ 57,  23, -96,  97, -24, -93, -33, -64,  61,  15],
       [ 44,  29,  31, -38,  11,  85,  37, -96, -37, -70],
       [-10, -37, -24, -66,  27, -44, -16, -50,   3, -91],
       [-97,  81,  52,  41,  39, -14,  95,  76,  28, -32],
       [-74,  49, -91, -65, -96,  86, -13,  43,  22,  80],
       [  5,  20, -77,  74, -89,  46, -90,  95,  30,  13],
       [ 36,   6,  55, -74, -49, -66,  38,  37, -84,  28],
       [-23, -28, -32, -30,  -4, -52,  -4,  99, -67, -98]])

所以......

>>> def sample_positive(X, y, num):
...     pos_index = np.where(y == 1)[0]
...     rows = np.random.choice(pos_index, size=num, replace=False)
...     mat = X[rows,:]
...     return (mat, rows)
...
>>> X_sample, idx = sample_positive(X, y, 2)
>>> X_sample
array([[-23, -28, -32, -30,  -4, -52,  -4,  99, -67, -98],
       [-10, -37, -24, -66,  27, -44, -16, -50,   3, -91]])
>>> idx
array([9, 4])
>>> X
array([[-25,  62,  94,  70,  96,  70,  38, -18, -57,   1],
       [ 40,  86, -98, -48,  40,  29,   4, -83,  44, -12],
       [ 57,  23, -96,  97, -24, -93, -33, -64,  61,  15],
       [ 44,  29,  31, -38,  11,  85,  37, -96, -37, -70],
       [-10, -37, -24, -66,  27, -44, -16, -50,   3, -91],
       [-97,  81,  52,  41,  39, -14,  95,  76,  28, -32],
       [-74,  49, -91, -65, -96,  86, -13,  43,  22,  80],
       [  5,  20, -77,  74, -89,  46, -90,  95,  30,  13],
       [ 36,   6,  55, -74, -49, -66,  38,  37, -84,  28],
       [-23, -28, -32, -30,  -4, -52,  -4,  99, -67, -98]])
>>> y
array([1, 0, 1, 1, 1, 0, 0, 1, 0, 1])