我已经实现了PCA的逻辑,但是现在我陷入了布尔维的索引错误。以下是我的代码:
np.random.seed(1) # random seed for consistency
mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20)
assert class1_sample.shape == (20,3), "The matrix has not the dimensions 20x3"
mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20)
assert class1_sample.shape == (20, 3), "The matrix has not the dimensions 20x3"
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import proj3d
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
plt.rcParams['legend.fontsize'] = 10
ax.plot(class1_sample[:,0], class1_sample[:,1], class1_sample[:,2],
'o', markersize=8, color='blue', alpha=0.5, label='class1')
ax.plot(class2_sample[:,0], class2_sample[:,1], class2_sample[:,2],
'^', markersize=8, alpha=0.5, color='red', label='class2')
plt.title('Samples for class 1 and class 2')
ax.legend(loc='upper right')
all_samples = np.concatenate((class1_sample, class2_sample), axis=0)
assert all_samples.shape == (40,3), "The matrix has not the dimensions 3x40"
# step 1, normalize the features...
def feature_normalize(all_samples):
m, n = all_samples.shape
mu = np.mean(all_samples, axis=0)
X_norm = all_samples - mu
sigma = np.std(X_norm, axis=0, ddof=1)
X_norm = X_norm / sigma
return (X_norm, mu, sigma)
X_norm, mu, sigma = feature_normalize(all_samples)
print("mu: ",mu)
print("sigma: ",sigma)
print("X_norm: ",X_norm[:5, :])
# step 2, do the PCA
import scipy.linalg as linalg
def pca(all_samples):
m, n = all_samples.shape
Sigma = np.empty( (n, n) )
U = S = np.zeros( (n, n) )
sigma = (1. / m) * np.dot(all_samples.T, all_samples)
U, S, V = linalg.svd(sigma)
S = linalg.diagsvd(S, len(S), len(S))
return U, S
U, S = pca(X_norm)
print(U)
print(S)
# step 3, reduce to 2 dimensions and project back onto the 2 dimensions
from sklearn import decomposition
pca = decomposition.PCA(n_components=2, svd_solver='full')
pca.fit(all_samples)
all_samples = pca.transform(all_samples)
plt.figure()
plt.scatter(np.diff(all_samples[y==0, 0], all_samples[y==0, 1]), c='b', label='class 1')
plt.scatter(all_samples[y==1, 0], all_samples[y==1, 1], c='g', label='class 2')
plt.xlabel('z1')
plt.ylabel('z2')
plt.legend()
由于某种原因,我的布尔索引不匹配,我正在尝试减小数组的大小 作为结果的一部分,我收到以下错误:
错误
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-73-365d8b4d1c3f> in <module>()
7
8 plt.figure()
----> 9 plt.scatter(np.diff(all_samples[y==0, 0], all_samples[y==0, 1]), c='b', label='class 1')
10 plt.scatter(all_samples[y==1, 0], all_samples[y==1, 1], c='g', label='class 2')
11
IndexError: boolean index did not match indexed array along dimension 0; dimension is 40 but the corresponding boolean dimension is 100.
谁能告诉我代码中缺少什么?