修改

我想知道所有数据的规范化和子集上的规范化是否相同。现在当我试图简化我关于@BartoszKP建议的例子时，我发现我理解规范化是如何工作的，这是错误的。两种情况下的规范化都以相同的方式工作，所以这是一种有效的方法，对吧？（见代码）

from sklearn.preprocessing import normalize
from sklearn.decomposition import RandomizedPCA
import numpy as np

data_1 = np.array(([52, 254], [4, 128]), dtype='f')
data_2 = np.array(([39, 213], [123, 7]), dtype='f')
data_combined = np.vstack((data_1, data_2))
#print(data_combined)
"""
Output
[[  52.  254.]
 [   4.  128.]
 [  39.  213.]
 [ 123.    7.]]
"""
#Normalize all data
data_norm = normalize(data_combined)
print(data_norm)
"""
[[ 0.20056452  0.97968054]
 [ 0.03123475  0.99951208]
 [ 0.18010448  0.98364753]
 [ 0.99838448  0.05681863]]
"""

pca = RandomizedPCA(n_components=20, whiten=True)
pca.fit(data_norm)

#Normalize subset of data
data_1_norm = normalize(data_1)
print(data_1_norm)
"""
[[ 0.20056452  0.97968054]
 [ 0.03123475  0.99951208]]
"""
pca.transform(data_1_norm)

Answer 1

是的，正如the documentation中所解释的，normalize的作用是将个别样本扩展到其他样本：

规范化是扩展单个样本以具有单位规范的过程。

另外在the documentation of the Normalizer class：

中解释了这一点

具有至少一个非零分量的每个样本（即数据矩阵的每一行）独立于其他样本重新调整，以使其范数（l1或l2）等于1。

^{（强调我的）}

当数据被分割时，使用scikit-learn标准化PCA

修改

1 个答案: