Question

我不是来自统计数据，但是通过机器学习和NN的一项工作，我看到缩放数据会产生很多伤害。根据我的经验，在列车测试之前缩放数据并不是一个好的选择，但是在列车测试分离后进行缩放时，请看一下这个例子。

import numpy as np
from sklearn.preprocessing import StandardScaler


train_matrix = np.array([[1,2,3,4,5]]).T

test_matrix = np.array([[1]]).T


e =StandardScaler()
train_matrix = e.fit_transform(train_matrix)
test_matrix = e.fit_transform(test_matrix)

print(train_matrix)

print(test_matrix)

[out]:

[[-1.41421356]   #train data
 [-0.70710678]
 [ 0.        ]
 [ 0.70710678]
 [ 1.41421356]]


[[ 0.]]   #test data

StandardScaler类会为每个数据集执行两个不同的缩放过程，并且可能会损害您的NN结果的错误是：

在列车矩阵1中的

是-1.41421356，而在测试矩阵1中是0.现在想象你做一个带有训练权重测试数据的预测模型。对于1，您将收到完全不同的结果。怎么克服这个？

Answer 1

你不应该单独转换火车和测试。相反，您应该将缩放器放在训练数据上（然后使用缩放器对其进行转换），然后使用适合的缩放器转换测试数据。所以在你的代码中你应该这样做：

// If cart contains a product that is not paint and product to be added is paint, display error message and return false.
    elseif (!$cat_check && get_queried_object()->term_id = '245778522') {
        wc_add_notice( 'Sorry, you can only purchase paint products on their own. To purchase this product, please checkout your current cart or empty your cart and try again' , 'error' );
        return false;
    }

然后，当您打印转换后的训练和测试数据时，您会得到预期的结果：

e =StandardScaler()
train_matrix = e.fit_transform(train_matrix)
test_matrix = e.transform(test_matrix)

sklearn Standardscaler（）可以影响测试矩阵的结果

1 个答案: