Question

所以我有一个numpy数组包含来自点云的k-neighboors（k = 10）点的XYZ坐标：

  char buffer [50];
  int a=2, b=3;
  snprintf (buffer, 50, "%d + %d = %d", a, b, a+b);

有了这个形状：

k_neighboors
Out[53]: 
array([[[  2.51508147e-01,   5.60274944e-02,   1.98303187e+00],
        [  2.48552352e-01,   5.95569573e-02,   1.98319519e+00],
        [  2.56611764e-01,   5.36767729e-02,   1.98236740e+00],
        ..., 
        [  2.54520357e-01,   6.23480231e-02,   1.98255634e+00],
        [  2.57603496e-01,   5.19787706e-02,   1.98221457e+00],
        [  2.43914440e-01,   5.68424985e-02,   1.98352253e+00]],

       [[  9.72352773e-02,   2.06699912e-02,   1.99344850e+00],
        [  9.91205871e-02,   2.36056261e-02,   1.99329960e+00],
        [  9.59625840e-02,   1.71508361e-02,   1.99356234e+00],
        ..., 
        [  1.03216261e-01,   2.19752081e-02,   1.99304521e+00],
        [  9.65025574e-02,   1.44127617e-02,   1.99355054e+00],
        [  9.59930867e-02,   2.72080526e-02,   1.99344873e+00]],

       [[  1.76408485e-01,   2.81930678e-02,   1.98819435e+00],
        [  1.78670138e-01,   2.81904750e-02,   1.98804617e+00],
        [  1.80372953e-01,   3.05109434e-02,   1.98791444e+00],
        ..., 
        [  1.81960404e-01,   2.47725621e-02,   1.98785996e+00],
        [  1.74499243e-01,   3.50728296e-02,   1.98826015e+00],
        [  1.83470801e-01,   2.70808022e-02,   1.98774099e+00]],

       ..., 
       [[  1.78178743e-01,  -4.60980982e-02,  -1.98792374e+00],
        [  1.77953839e-01,  -4.73701134e-02,  -1.98792756e+00],
        [  1.77889392e-01,  -4.75468598e-02,  -1.98793030e+00],
        ..., 
        [  1.79924294e-01,  -5.08776568e-02,  -1.98772371e+00],
        [  1.76720902e-01,  -5.11409082e-02,  -1.98791265e+00],
        [  1.83644593e-01,  -4.64747548e-02,  -1.98756230e+00]],

       [[  2.00245917e-01,  -2.33091787e-03,  -1.98685515e+00],
        [  2.02384919e-01,  -5.60011715e-04,  -1.98673022e+00],
        [  1.97325528e-01,  -1.03301927e-03,  -1.98705769e+00],
        ..., 
        [  1.95464164e-01,  -6.23105839e-03,  -1.98713481e+00],
        [  1.98985338e-01,  -8.39920342e-03,  -1.98688531e+00],
        [  1.95959195e-01,   2.68006674e-03,  -1.98713303e+00]],

       [[  1.28851235e-01,  -3.24527062e-02,  -1.99127460e+00],
        [  1.26415789e-01,  -3.27731185e-02,  -1.99143147e+00],
        [  1.25985757e-01,  -3.24910432e-02,  -1.99146211e+00],
        ..., 
        [  1.28296465e-01,  -3.92388329e-02,  -1.99117136e+00],
        [  1.34895295e-01,  -3.64872888e-02,  -1.99083793e+00],
        [  1.29047096e-01,  -3.97952795e-02,  -1.99111152e+00]]])

我有这个函数，它将主成分分析应用于以二维数组提供的一些数据：

k_neighboors.shape
Out[54]: (2999986, 10, 3)

所以问题是：如何以一种不像以前那样的方式在每个2999986 10x3阵列上应用上面提到的PCA功能：

def PCA(data, correlation=False, sort=True):
    """ Applies Principal Component Analysis to the data

    Parameters
    ----------        
    data: array
        The array containing the data. The array must have NxM dimensions, where each
        of the N rows represents a different individual record and each of the M columns
        represents a different variable recorded for that individual record.
            array([
            [V11, ... , V1m],
            ...,
            [Vn1, ... , Vnm]])

    correlation(Optional) : bool
            Set the type of matrix to be computed (see Notes):
                If True compute the correlation matrix.
                If False(Default) compute the covariance matrix. 

    sort(Optional) : bool
            Set the order that the eigenvalues/vectors will have
                If True(Default) they will be sorted (from higher value to less).
                If False they won't.   
    Returns
    -------
    eigenvalues: (1,M) array
        The eigenvalues of the corresponding matrix.

    eigenvector: (M,M) array
        The eigenvectors of the corresponding matrix.

    Notes
    -----
    The correlation matrix is a better choice whent there are different magnitudes
    representing the M variables. Use covariance matrix in any other case.

    """

    #: get the mean of all variables
    mean = np.mean(data, axis=0, dtype=np.float64)

    #: adjust the data by substracting the mean to each variable
    data_adjust = data - mean

    #: compute the covariance/correlation matrix
    #: the data is transposed due to np.cov/corrcoef sintaxis
    if correlation:
        matrix = np.corrcoef(data_adjust.T)
    else:
        matrix = np.cov(data_adjust.T) 

    #: get the eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eig(matrix)

    if sort:
        #: sort eigenvalues and eigenvectors
        sort = eigenvalues.argsort()[::-1]
        eigenvalues = eigenvalues[sort]
        eigenvectors = eigenvectors[:,sort]

    return eigenvalues, eigenvectors

非常感谢

Answer 1

感谢@Divakar和@Eelco评论。

使用Divakar发布on this answer

的功能

 def vectorized_app(data):            
        diffs = data - data.mean(1,keepdims=True)
        return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]

使用Eelco指出的评论，我最终得到了这个。

k_neighboors.shape
Out[48]: (2999986, 10, 3)

#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]

data
Out[50]: 
array([[ 0.10530792,  0.01028906,  0.99438643],
       [ 0.06462   ,  0.00944352,  0.99786526],
       [ 0.0654035 ,  0.00860751,  0.99782177],
       ..., 
       [-0.0632175 ,  0.01613551,  0.99786933],
       [-0.06449399,  0.00552943,  0.99790278],
       [-0.06081954,  0.01802078,  0.99798609]])

Wich给出与for循环相同的结果，而不是永远（虽然仍需要一段时间）：

data2 = np.empty((2999986, 3))

for i in range(len(k_neighboors)):
    if i > 10:
        break #:   I break the loop in order to don't have to wait for ever.
    w, v = PCA(k_neighboors[i])
    data2[i] = v[:,2]


data2
Out[52]: 
array([[ 0.10530792,  0.01028906,  0.99438643],
       [ 0.06462   ,  0.00944352,  0.99786526],
       [ 0.0654035 ,  0.00860751,  0.99782177],
       ..., 
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ]])

我不知道是否有更好的方法可以做到这一点，所以我要打开这个问题。

如何对一个'for'循环进行向量化，该循环调用一个函数（以一个二维数组作为参数）在一个三维numpy数组上

1 个答案: