为什么归一化范围不在0和1之间?

时间:2019-08-15 05:50:14

标签: python machine-learning normalization

这是我的代码。我正在尝试将归一化应用于数据集,但是我可以看到输出未在0到1之间缩放。 由于此代码适用于虹膜数据集。归一化不是总是返回0到1之间的缩放值吗?

# Normalize the data attributes for the boston dataset.
from sklearn.datasets import load_boston
from sklearn import preprocessing
# load the iris dataset
dataset = load_boston()
print(iris.data.shape)
# separate the data from the target attributes
X = dataset.data
y = dataset.target
# normalize the data attributes
normalized_X = preprocessing.normalize(X)



normalized_X[:5]

输出:

array([[1.26388341e-05, 3.59966795e-02, 4.61957387e-03, 0.00000000e+00,
        1.07590075e-03, 1.31487871e-02, 1.30387972e-01, 8.17924550e-03,
        1.99981553e-03, 5.91945396e-01, 3.05971776e-02, 7.93726783e-01,
        9.95908132e-03],
       [5.78529889e-05, 0.00000000e+00, 1.49769546e-02, 0.00000000e+00,
        9.93520754e-04, 1.36021253e-02, 1.67140272e-01, 1.05222110e-02,
        4.23676228e-03, 5.12648235e-01, 3.77071843e-02, 8.40785474e-01,
        1.93620036e-02],
       [5.85729947e-05, 0.00000000e+00, 1.51744622e-02, 0.00000000e+00,
        1.00662274e-03, 1.54212886e-02, 1.31139977e-01, 1.06609718e-02,
        4.29263427e-03, 5.19408747e-01, 3.82044450e-02, 8.43137761e-01,
        8.64965806e-03],
       [7.10489715e-05, 0.00000000e+00, 4.78488594e-03, 0.00000000e+00,
        1.00526503e-03, 1.53599229e-02, 1.00526503e-01, 1.33059337e-02,
        6.58470542e-03, 4.87268201e-01, 4.10446638e-02, 8.66174100e-01,
        6.45301131e-03],
       [1.50596596e-04, 0.00000000e+00, 4.75453408e-03, 0.00000000e+00,
        9.98888353e-04, 1.55874565e-02, 1.18209058e-01, 1.32215305e-02,
        6.54293681e-03, 4.84177324e-01, 4.07843061e-02, 8.65630540e-01,
        1.16246177e-02]])

2 个答案:

答案 0 :(得分:3)

为什么说值不在0到1之间?

归一化并不意味着min=0max=1 ...这意味着将对每个非零向量进行缩放,以使其范数(默认为L2范数)为1。

换句话说,对于每个向量,每个坐标的平方和为1。

例如,考虑到您的最后一个矢量,我们可以看到

In [1]: x = [1.50596596e-04, 0.00000000e+00, 4.75453408e-03, 0.00000000e+00, 
   ...:         9.98888353e-04, 1.55874565e-02, 1.18209058e-01, 1.32215305e-02, 
   ...:         6.54293681e-03, 4.84177324e-01, 4.07843061e-02, 8.65630540e-01, 
   ...:         1.16246177e-02]                                                                   

In [2]: sum(c**2 for c in x)                                                                      
Out[2]: 0.9999999993530653

In [3]:  

答案 1 :(得分:-1)

归一化并不总是以0-1之间的值结束。

使用MinMaxScaler来获得0-1之间的值:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
normalized_X = scaler.transform(data)