所以我有一个numpy数组包含来自点云的k-neighboors(k = 10)点的XYZ坐标:
char buffer [50];
int a=2, b=3;
snprintf (buffer, 50, "%d + %d = %d", a, b, a+b);
有了这个形状:
k_neighboors
Out[53]:
array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00],
[ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00],
[ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00],
...,
[ 2.54520357e-01, 6.23480231e-02, 1.98255634e+00],
[ 2.57603496e-01, 5.19787706e-02, 1.98221457e+00],
[ 2.43914440e-01, 5.68424985e-02, 1.98352253e+00]],
[[ 9.72352773e-02, 2.06699912e-02, 1.99344850e+00],
[ 9.91205871e-02, 2.36056261e-02, 1.99329960e+00],
[ 9.59625840e-02, 1.71508361e-02, 1.99356234e+00],
...,
[ 1.03216261e-01, 2.19752081e-02, 1.99304521e+00],
[ 9.65025574e-02, 1.44127617e-02, 1.99355054e+00],
[ 9.59930867e-02, 2.72080526e-02, 1.99344873e+00]],
[[ 1.76408485e-01, 2.81930678e-02, 1.98819435e+00],
[ 1.78670138e-01, 2.81904750e-02, 1.98804617e+00],
[ 1.80372953e-01, 3.05109434e-02, 1.98791444e+00],
...,
[ 1.81960404e-01, 2.47725621e-02, 1.98785996e+00],
[ 1.74499243e-01, 3.50728296e-02, 1.98826015e+00],
[ 1.83470801e-01, 2.70808022e-02, 1.98774099e+00]],
...,
[[ 1.78178743e-01, -4.60980982e-02, -1.98792374e+00],
[ 1.77953839e-01, -4.73701134e-02, -1.98792756e+00],
[ 1.77889392e-01, -4.75468598e-02, -1.98793030e+00],
...,
[ 1.79924294e-01, -5.08776568e-02, -1.98772371e+00],
[ 1.76720902e-01, -5.11409082e-02, -1.98791265e+00],
[ 1.83644593e-01, -4.64747548e-02, -1.98756230e+00]],
[[ 2.00245917e-01, -2.33091787e-03, -1.98685515e+00],
[ 2.02384919e-01, -5.60011715e-04, -1.98673022e+00],
[ 1.97325528e-01, -1.03301927e-03, -1.98705769e+00],
...,
[ 1.95464164e-01, -6.23105839e-03, -1.98713481e+00],
[ 1.98985338e-01, -8.39920342e-03, -1.98688531e+00],
[ 1.95959195e-01, 2.68006674e-03, -1.98713303e+00]],
[[ 1.28851235e-01, -3.24527062e-02, -1.99127460e+00],
[ 1.26415789e-01, -3.27731185e-02, -1.99143147e+00],
[ 1.25985757e-01, -3.24910432e-02, -1.99146211e+00],
...,
[ 1.28296465e-01, -3.92388329e-02, -1.99117136e+00],
[ 1.34895295e-01, -3.64872888e-02, -1.99083793e+00],
[ 1.29047096e-01, -3.97952795e-02, -1.99111152e+00]]])
我有这个函数,它将主成分分析应用于以二维数组提供的一些数据:
k_neighboors.shape
Out[54]: (2999986, 10, 3)
所以问题是:如何以一种不像以前那样的方式在每个2999986 10x3阵列上应用上面提到的PCA功能:
def PCA(data, correlation=False, sort=True):
""" Applies Principal Component Analysis to the data
Parameters
----------
data: array
The array containing the data. The array must have NxM dimensions, where each
of the N rows represents a different individual record and each of the M columns
represents a different variable recorded for that individual record.
array([
[V11, ... , V1m],
...,
[Vn1, ... , Vnm]])
correlation(Optional) : bool
Set the type of matrix to be computed (see Notes):
If True compute the correlation matrix.
If False(Default) compute the covariance matrix.
sort(Optional) : bool
Set the order that the eigenvalues/vectors will have
If True(Default) they will be sorted (from higher value to less).
If False they won't.
Returns
-------
eigenvalues: (1,M) array
The eigenvalues of the corresponding matrix.
eigenvector: (M,M) array
The eigenvectors of the corresponding matrix.
Notes
-----
The correlation matrix is a better choice whent there are different magnitudes
representing the M variables. Use covariance matrix in any other case.
"""
#: get the mean of all variables
mean = np.mean(data, axis=0, dtype=np.float64)
#: adjust the data by substracting the mean to each variable
data_adjust = data - mean
#: compute the covariance/correlation matrix
#: the data is transposed due to np.cov/corrcoef sintaxis
if correlation:
matrix = np.corrcoef(data_adjust.T)
else:
matrix = np.cov(data_adjust.T)
#: get the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
if sort:
#: sort eigenvalues and eigenvectors
sort = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[sort]
eigenvectors = eigenvectors[:,sort]
return eigenvalues, eigenvectors
非常感谢
答案 0 :(得分:0)
感谢@Divakar和@Eelco评论。
使用Divakar发布on this answer
的功能 def vectorized_app(data):
diffs = data - data.mean(1,keepdims=True)
return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
使用Eelco指出的评论,我最终得到了这个。
k_neighboors.shape
Out[48]: (2999986, 10, 3)
#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
data
Out[50]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
Wich给出与for循环相同的结果,而不是永远(虽然仍需要一段时间):
data2 = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
if i > 10:
break #: I break the loop in order to don't have to wait for ever.
w, v = PCA(k_neighboors[i])
data2[i] = v[:,2]
data2
Out[52]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
我不知道是否有更好的方法可以做到这一点,所以我要打开这个问题。