我正在尝试对表示图像3个波段的数据集进行PCA分析。数据集的大小为(300000,3),是像素和3波段。我找到特征值和对,然后将其放入称为eig_pairs
的元组中。然后,我计算方差以确定要用于PCA的频带。
我确定我希望使用2个波段。
我的eig_pairs
形状是3号元组的列表。
在this tutorial之后,我说我需要通过从原始尺寸空间(3)减小到等于我要使用的尺寸数(2)的数量来重塑一切。他们的示例适用于7到4,如下所示:
matrix_w = np.hstack((eig_pairs[0][1].reshape(7,1), eig_pairs[1][1].reshape(7,1), eig_pairs[2][1].reshape(7,1), eig_pairs[3][1].reshape(7,1)))
按照这种逻辑,我将自己更改为:
matrix_w = np.hstack((eig_pairs 0。reshape(3,1), eig_pairs 1。reshape(3,1)))
但是我收到错误ValueError: shapes (3131892,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
#read in image
img = cv2.imread('/Volumes/EXTERNAL/Stitched-Photos-for-Chris/p7_0015_20161005-949am-75m-pass-1.jpg.png',1)
row,col = img.shape[:2]
b,g,r = cv2.split(img)
# Pandas dataset
# samples = 3000000, featuress = 3
dataSet = pd.DataFrame({'bBnad':b.flat[:],'gBnad':g.flat[:],'rBnad':r.flat[:]})
print(dataSet.head())
# Standardize the data
X = dataSet.values
X_std = StandardScaler().fit_transform(X) #converts data from unit8 to float64
#Calculating Eigenvectors and eigenvalues of Covariance matrix
meanVec = np.mean(X_std, axis=0)
covarianceMatx = np.cov(X_std.T)
eigVals, eigVecs = np.linalg.eig(covarianceMatx)
# Create a list of (eigenvalue, eigenvector) tuples
eig_pairs = [ (np.abs(eigVals[i]),eigVecs[:,i]) for i in range(len(eigVals))]
# Sort from high to low
eig_pairs.sort(key = lambda x: x[0], reverse= True)
# Determine how many PC going to choose for new feature subspace via
# the explained variance measure which is calculated from eigen vals
# The explained variance tells us how much information (variance) can
# be attributed to each of the principal components
tot = sum(eigVals)
var_exp = [(i / tot)*100 for i in sorted(eigVals, reverse=True)]
cum_var_exp = np.cumsum(var_exp)
#convert 3 dimension space to 2 dimensional space therefore getting a 2x3 matrix W
matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1),
eig_pairs[1][1].reshape(3,1)))
感谢任何帮助。