R:z分数归一化

时间:2015-11-09 12:20:55

标签: r matrix normalization

我希望z-score标准化R中矩阵的每一行。我使用normalize-function,它可以正常工作:

library(som)

training <- matrix(seq(1:20), ncol = 10)
training
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    3    5    7    9   11   13   15   17    19
[2,]    2    4    6    8   10   12   14   16   18    20
training_zscore <- normalize(training, byrow=TRUE)
training_zscore
          [,1]      [,2]       [,3]       [,4]       [,5]      [,6]      [,7]      [,8]     [,9]    [,10]
[1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
[2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301

让我们假设我现在有另一个矩阵,例如以下内容:

validation <- matrix(seq(1:20)*2, ncol = 10)
validation
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    6   10   14   18   22   26   30   34    38
[2,]    4    8   12   16   20   24   28   32   36    40

我也想对这个新矩阵进行z-score变换。但是,缩放应该与训练z得分矩阵相同。我怎样才能做到这一点?

如果我只是执行单独的z-score标准化,我会得到以下输出:

> validation_zscore <- normalize(validation, byrow=TRUE)
> validation_zscore
          [,1]      [,2]       [,3]       [,4]       [,5]      [,6]      [,7]      [,8]     [,9]    [,10]
[1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
[2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301

然而,这不是我想要的,例如在训练矩阵中,值“10”被转换为“-0.1651446”的z分数。这也应该是验证矩阵中的情况(然而这里的10被转换为“-0.8257228”的z分数):

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

目前尚不清楚,但我认为您希望validation的每一行都使用training作为“参考”进行规范化。如果是这样,您可以使用base::scale并给出均值和标准差的数值。无论如何,使用som::normalize有什么意义?

training <- matrix(seq(1:20), ncol = 10)
training_zscore <- t(scale(t(training)))
training_zscore
# [,1]      [,2]       [,3]       [,4]       [,5]      [,6]      [,7]      [,8]     [,9]    [,10]
# [1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
# [2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
# attr(,"scaled:center")
# [1] 10 11
# attr(,"scaled:scale")
# [1] 6.055301 6.055301

validation <- matrix(seq(1:20)*2, ncol = 10)    
validation_zscore <- t(scale(t(validation), center = rowMeans(training),
                             scale = apply(training, 1, sd)))
# [,1]       [,2]      [,3]      [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
# [1,] -1.321157 -0.6605783 0.0000000 0.6605783 1.321157 1.981735 2.642313 3.302891 3.963470 4.624048
# [2,] -1.156012 -0.4954337 0.1651446 0.8257228 1.486301 2.146879 2.807458 3.468036 4.128614 4.789192
# attr(,"scaled:center")
# [1] 10 11
# attr(,"scaled:scale")
# [1] 6.055301 6.055301