我希望z-score标准化R中矩阵的每一行。我使用normalize-function,它可以正常工作:
library(som)
training <- matrix(seq(1:20), ncol = 10)
training
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 3 5 7 9 11 13 15 17 19
[2,] 2 4 6 8 10 12 14 16 18 20
training_zscore <- normalize(training, byrow=TRUE)
training_zscore
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
[2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
让我们假设我现在有另一个矩阵,例如以下内容:
validation <- matrix(seq(1:20)*2, ncol = 10)
validation
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2 6 10 14 18 22 26 30 34 38
[2,] 4 8 12 16 20 24 28 32 36 40
我也想对这个新矩阵进行z-score变换。但是,缩放应该与训练z得分矩阵相同。我怎样才能做到这一点?
如果我只是执行单独的z-score标准化,我会得到以下输出:
> validation_zscore <- normalize(validation, byrow=TRUE)
> validation_zscore
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
[2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
然而,这不是我想要的,例如在训练矩阵中,值“10”被转换为“-0.1651446”的z分数。这也应该是验证矩阵中的情况(然而这里的10被转换为“-0.8257228”的z分数):
感谢您的帮助!
答案 0 :(得分:1)
目前尚不清楚,但我认为您希望validation
的每一行都使用training
作为“参考”进行规范化。如果是这样,您可以使用base::scale
并给出均值和标准差的数值。无论如何,使用som::normalize
有什么意义?
training <- matrix(seq(1:20), ncol = 10)
training_zscore <- t(scale(t(training)))
training_zscore
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
# [2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
# attr(,"scaled:center")
# [1] 10 11
# attr(,"scaled:scale")
# [1] 6.055301 6.055301
validation <- matrix(seq(1:20)*2, ncol = 10)
validation_zscore <- t(scale(t(validation), center = rowMeans(training),
scale = apply(training, 1, sd)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] -1.321157 -0.6605783 0.0000000 0.6605783 1.321157 1.981735 2.642313 3.302891 3.963470 4.624048
# [2,] -1.156012 -0.4954337 0.1651446 0.8257228 1.486301 2.146879 2.807458 3.468036 4.128614 4.789192
# attr(,"scaled:center")
# [1] 10 11
# attr(,"scaled:scale")
# [1] 6.055301 6.055301