我试图理解为什么我的算法的准确性突然发生了巨大变化。当我发现在标准化我的4维列车/测试集时我只使用了3个索引时,我做了一个小改动,我添加了一个:
。现在我很好奇 - 下面的旧/新代码会不会这样做?如果没有,如何仅使用3个索引索引到4维数组?
旧
# standardize all non-binary variables
channels = 14 # int(X.shape[1])
mu_f = np.zeros(shape=channels)
sigma_f = np.zeros(shape=channels)
for i in range(channels):
mu_f[i] = np.mean(X_train[:,i,:])
sigma_f[i] = np.std(X_train[:,i,:])
for i in range(channels):
X_train[:, i, :] -= mu_f[i]
X_test[:, i, :] -= mu_f[i]
if (sigma_f[i] != 0):
X_train[:, i, :] /= sigma_f[i]
X_test[:, i, :] /= sigma_f[i]
新
# standardize all non-binary variables
channels = 14
mu_f = np.zeros(shape=channels)
sigma_f = np.zeros(shape=channels)
for i in range(channels):
mu_f[i] = np.mean(X_train[:,i,:,:])
sigma_f[i] = np.std(X_train[:,i,:,:])
for i in range(channels):
X_train[:, i, :, :] -= mu_f[i]
X_test[:, i, :, :] -= mu_f[i]
if (sigma_f[i] != 0):
X_train[:, i, :, :] /= sigma_f[i]
X_test[:, i, :, :] /= sigma_f[i]
答案 0 :(得分:2)
我不明白为什么额外的:
会产生影响。当我对简单的np.mean(X[:,1])
v np.mean(X,1,:,:]
等进行时间测试时,它不会发生。
至于plonser's
建议您可以对整个事物进行矢量化,关键是要意识到mean
和std
需要添加一些参数。检查他们的文档并使用样本数组。
Xmean = np.mean(X,axis=(0,2,3),keepdims=True)
X -= Xmean
X /= Xmean