python留一出估计

时间:2015-01-20 22:19:41

标签: python numpy scikit-learn combinatorics

我想从某个向量x=(x_1,x_2, ..., x_I)获取矩阵,其中此矩阵中的每一行i对应x(i) := (x_1,...,x_{i-1},x_{i+1},...,x_I)

我知道

from sklearn.cross_validation import LeaveOneOut
I = 30
myrowiterator = LeaveOneOut(I)
for eachrow, _ in myrowiterator:
    print(eachrow)    # prints [1,2,...,29]
                      #        [0,2,...,29] and so on ...

提供了获取上述矩阵的每一行的例程。但我宁愿直接在一步中获得矩阵,直接在这个矩阵上运行,而不是循环遍历它的行。这样可以节省一些计算时间。

2 个答案:

答案 0 :(得分:3)

由于你有numpy标签,以下工作:

>>> N = 5
>>> idx = np.arange(N)
>>> idx = idx[1:] - (idx[:, None] >= idx[1:])
>>> idx
array([[1, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 1, 3, 4],
       [0, 1, 2, 4],
       [0, 1, 2, 3]])

现在您可以使用它来索引任何其他数组:

>>> a = np.array(['a', 'b', 'c', 'd', 'e'])
>>> a[idx]
array([['b', 'c', 'd', 'e'],
       ['a', 'c', 'd', 'e'],
       ['a', 'b', 'd', 'e'],
       ['a', 'b', 'c', 'e'],
       ['a', 'b', 'c', 'd']],
      dtype='|S1')

编辑正如@ user3820991所暗示的那样,通过将其写为:

,可以减少一点神秘感。
>>> N = 5
>>> idx = np.arange(1, N) - np.tri(N, N-1, k=-1, dtype=bool)
>>> idx
array([[1, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 1, 3, 4],
       [0, 1, 2, 4],
       [0, 1, 2, 3]])

函数np.tri实际上是本答案第一版中神奇比较的高度优化版本,因为它使用尽可能小的int类型作为数组的大小,因为numpy中的比较是使用向量化的SIMD,因此类型越小,操作越快。

答案 1 :(得分:1)

以下将会这样做:

In [31]: np.array([row for row, _ in LeaveOneOut(I)])
Out[31]: 
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0,  1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0,  1,  2,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       ...
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]])