PCA分析:IndexError:布尔索引与维度0上的索引数组不匹配;维度为40,但相应的布尔维度为100

时间:2019-12-14 01:58:16

标签: python python-3.x scikit-learn pca numpy-ndarray

我已经实现了PCA的逻辑,但是现在我陷入了布尔维的索引错误。以下是我的代码:

np.random.seed(1) # random seed for consistency

mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20)
assert class1_sample.shape == (20,3), "The matrix has not the dimensions 20x3"

mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20)
assert class1_sample.shape == (20, 3), "The matrix has not the dimensions 20x3"


from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import proj3d

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
plt.rcParams['legend.fontsize'] = 10
ax.plot(class1_sample[:,0], class1_sample[:,1], class1_sample[:,2],
    'o', markersize=8, color='blue', alpha=0.5, label='class1')
ax.plot(class2_sample[:,0], class2_sample[:,1], class2_sample[:,2],
    '^', markersize=8, alpha=0.5, color='red', label='class2')

plt.title('Samples for class 1 and class 2')
ax.legend(loc='upper right')


all_samples = np.concatenate((class1_sample, class2_sample), axis=0)
assert all_samples.shape == (40,3), "The matrix has not the dimensions 3x40"


# step 1, normalize the features...
def feature_normalize(all_samples):

    m, n = all_samples.shape
    mu = np.mean(all_samples, axis=0)
    X_norm = all_samples - mu
    sigma = np.std(X_norm, axis=0, ddof=1)
    X_norm = X_norm / sigma

    return (X_norm, mu, sigma)


X_norm, mu, sigma = feature_normalize(all_samples)
print("mu: ",mu)
print("sigma: ",sigma)
print("X_norm: ",X_norm[:5, :])


# step 2, do the PCA
import scipy.linalg as linalg
def pca(all_samples):

    m, n = all_samples.shape

    Sigma = np.empty( (n, n) )
    U = S = np.zeros( (n, n) )

    sigma = (1. / m) * np.dot(all_samples.T, all_samples)
    U, S, V = linalg.svd(sigma)
    S = linalg.diagsvd(S, len(S), len(S))
    return U, S


U, S = pca(X_norm)
print(U)
print(S)



# step 3, reduce to 2 dimensions and project back onto the 2 dimensions
from sklearn import decomposition

pca = decomposition.PCA(n_components=2, svd_solver='full')
pca.fit(all_samples)
all_samples = pca.transform(all_samples)

plt.figure()
plt.scatter(np.diff(all_samples[y==0, 0], all_samples[y==0, 1]), c='b', label='class 1')
plt.scatter(all_samples[y==1, 0], all_samples[y==1, 1], c='g', label='class 2')

plt.xlabel('z1')
plt.ylabel('z2')
plt.legend()

由于某种原因,我的布尔索引不匹配,我正在尝试减小数组的大小 作为结果的一部分,我收到以下错误:

错误

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-73-365d8b4d1c3f> in <module>()
      7 
      8 plt.figure()
----> 9 plt.scatter(np.diff(all_samples[y==0, 0], all_samples[y==0, 1]), c='b', label='class 1')
     10 plt.scatter(all_samples[y==1, 0], all_samples[y==1, 1], c='g', label='class 2')
     11 

IndexError: boolean index did not match indexed array along dimension 0; dimension is 40 but the corresponding boolean dimension is 100.

谁能告诉我代码中缺少什么?

0 个答案:

没有答案