Question

我的问题非常简单。我有一个数据框，每行有不同的数字，超过100列。第一列始终为非零数字。我想要做的是用行中的第一个数字（第一列的值）替换每行（第一列除外）中的每个非零数字

我会想到ifelse和for循环遍历行的行，但必须有一个更简单的矢量化方法来做...

Answer 1

由于您的数据不是那么大，我建议您使用简单的循环

for (i in 1:nrow(mydata))
{
 for (j in 2:ncol(mydata)
  {

    mydata[i,j]<- ifelse(mydata[i,j]==0 ,0 ,mydata[i,1])
  }
 }

Answer 2

另一种方法是使用sapply，这比循环更有效。假设您的数据位于数据框df：

中

df[,-1] <- sapply(df[,-1], function(x) {ind <- which(x!=0); x[ind] = df[ind,1]; return(x)})

此处，我们将function应用于df的每一列，但第一列除外。在function中，x依次为每个列：

首先使用which找到列的行索引。
将x中的这些行设置为df第一列的行中的相应值。
返回列

请注意，函数中的操作都是＆＃34;矢量化＆＃34;在列上。也就是说，没有循环列的行。 sapply的结果是已处理列的矩阵，它替换了不是第一列的df的所有列。

请参阅this，了解*apply系列功能的完美评论。

希望这有帮助。

Answer 3

假设您的数据框为dat，我有一个完全矢量化解决方案：

mat <- as.matrix(dat[, -1])
pos <- which(mat != 0)
mat[pos] <- rep(dat[[1]], times = ncol(mat))[pos]
new_dat <- "colnames<-"(cbind.data.frame(dat[1], mat), colnames(dat))

示例

set.seed(0) dat <- "colnames<-"(cbind.data.frame(1:5, matrix(sample(0:1, 25, TRUE), 5)), c("val", letters[1:5])) # val a b c d e #1 1 1 0 0 1 1 #2 2 0 1 0 0 1 #3 3 0 1 0 1 0 #4 4 1 1 1 1 1 #5 5 1 1 0 0 0

上面的代码给出了：

# val a b c d e #1 1 1 0 0 1 1 #2 2 0 2 0 0 2 #3 3 0 3 0 3 0 #4 4 4 4 4 4 4 #5 5 5 5 0 0 0

您想要一个基准？

set.seed(0) n <- 2000 ## use a 2000 * 2000 matrix dat <- "colnames<-"(cbind.data.frame(1:n, matrix(sample(0:1, n * n, TRUE), n)), c("val", paste0("x",1:n))) ## have to test my solution first, as aichao's solution overwrites `dat` ## my solution system.time({mat <- as.matrix(dat[, -1]) pos <- which(mat != 0) mat[pos] <- rep(dat[[1]], times = ncol(mat))[pos] "colnames<-"(cbind.data.frame(dat[1], mat), colnames(dat))}) # user system elapsed # 0.352 0.056 0.410 ## solution by aichao system.time(dat[,-1] <- sapply(dat[,-1], function(x) {ind <- which(x!=0); x[ind] = dat[ind,1]; x})) # user system elapsed # 7.804 0.108 7.919

我的解决方案快了20倍！

使用第一列

3 个答案: