Question

我有一个大矩阵（数千行和数百行），我想在-1和1之间逐列标准化。这是我写的代码：

normalize <- function(x) { 
    for(j in 1:length(x[1,])){
        print(j)
        min <- min(x[,j])
        max <- max(x[,j])
        for(i in 1:length(x[,j])){
            x[i,j] <- 2 * (x[i,j] - min)/( max - min) - 1
        }
    }
    return(x)
}

不幸的是，它会慢下来。我见过这个：

normalize <- function(x) { 
    x <- sweep(x, 2, apply(x, 2, min)) 
    sweep(x, 2, apply(x, 2, max), "/") 
}

它很快但它在0和1之间正常化。你可以帮我修改它吗？对不起，我正在学习R

Answer 1

如何在自己的函数结束时重新调整矩阵x？

normalize <- function(x) { 
    x <- sweep(x, 2, apply(x, 2, min)) 
    x <- sweep(x, 2, apply(x, 2, max), "/") 
    2*x - 1
}

Answer 2

基准：

normalize2 <- function(A) { 
  scale(A,center=TRUE,scale=apply(A,2,function(x) 0.5*(max(x)-min(x))))
}

normalize3 <- function(mat) { 
  apply(mat,2,function(x) {xmin <- min(x); 2*(x-xmin)/(max(x)-xmin)-1})
}

normalize4 <- function(x) { 
  aa <- colMeans(x)
  x <- sweep(x, 2, aa)           # retrive the mean from each column

  2* sweep(x, 2, apply(x, 2, function(y) max(y)-min(y)), "/") 
}


set.seed(42)
mat <- matrix(sample(1:10,1e5,TRUE),1e3)
erg2 <- normalize2(mat)
attributes(erg2) <- attributes(normalize3(mat))
all.equal(  
  erg2,  
  normalize3(mat),   
  normalize4(mat)
  )

[1] TRUE

library(microbenchmark)
microbenchmark(normalize4(mat),normalize3(mat),normalize2(mat))

Unit: milliseconds
             expr      min       lq   median       uq      max
1 normalize2(mat) 4.846551 5.486845 5.597799 5.861976 30.46634
2 normalize3(mat) 4.191677 4.862655 4.980571 5.153438 28.94257
3 normalize4(mat) 4.960790 5.648666 5.766207 5.972404 30.08334

set.seed(42)
mat <- matrix(sample(1:10,1e4,TRUE),10)
microbenchmark(normalize4(mat),normalize3(mat),normalize2(mat))

Unit: milliseconds
             expr      min       lq   median       uq       max
1 normalize2(mat) 4.319131 4.445384 4.556756 4.821512  9.116263
2 normalize3(mat) 5.743305 5.927829 6.098392 6.454875 13.439526
3 normalize4(mat) 3.955712 4.102306 4.175394 4.402710  5.773221

如果列数较小，apply解决方案稍慢，但如果列数较大，则稍快一些。总的来说，性能的幅度相同。

Answer 3

这将使用相同的方法重新缩放矩阵

normalize <- function(x) { 
  x <- sweep(x, 2, apply(x, 2, mean))           # retrive the mean from each column
  2* sweep(x, 2, apply(x, 2, function(y) max(y)-min(y)), "/") 
}

}

修改

根据评论中的建议使用colMeans当然更快

normalize <- function(x) { aa <- colMeans(x) x <- sweep(x, 2, aa) # retrive the mean from each column 2* sweep(x, 2, apply(x, 2, function(y) max(y)-min(y)), "/") } A <- matrix(1:24, ncol=3) > normalize(A) [,1] [,2] [,3] [1,] -1.0000000 -1.0000000 -1.0000000 [2,] -0.7142857 -0.7142857 -0.7142857 [3,] -0.4285714 -0.4285714 -0.4285714 [4,] -0.1428571 -0.1428571 -0.1428571 [5,] 0.1428571 0.1428571 0.1428571 [6,] 0.4285714 0.4285714 0.4285714 [7,] 0.7142857 0.7142857 0.7142857 [8,] 1.0000000 1.0000000 1.0000000
使用基本软件包的scale函数
编辑

scale(A,center=TRUE,scale=apply(A,2,function(x) 0.5*(max(x)-min(x)))) [,1] [,2] [,3] [1,] -1.0000000 -1.0000000 -1.0000000 [2,] -0.7142857 -0.7142857 -0.7142857 [3,] -0.4285714 -0.4285714 -0.4285714 [4,] -0.1428571 -0.1428571 -0.1428571 [5,] 0.1428571 0.1428571 0.1428571 [6,] 0.4285714 0.4285714 0.4285714 [7,] 0.7142857 0.7142857 0.7142857 [8,] 1.0000000 1.0000000 1.0000000

Answer 4

如何：

x[,1] <- (x[,1]-mean(x[,1]))/(max(x[,1])-min(x[,1]))

R中的大多数基本函数都是矢量化的，因此不需要在代码中包含for循环。此代码段将扩展所有第1列（您也可以使用函数scale()，尽管它没有最小值/最大值选项。）

要执行整个数据集，您可以执行以下操作：

Scale <- function(y) y <- (y-mean(y))/(max(y)-min(y))
DataFrame.Scaled <- apply(DataFrame, 2, Scale)

编辑：还值得指出，不想要在函数后命名一个值。当您执行min <- min(x)时，下次请求min时，会导致与R混淆。

规范化矩阵在-1和1之间

4 个答案: