Question

假设我在R中有一个数据框，其中包含三个变量（a，b和c）的二进制条目

library(dplyr)
df <- data.frame(a = rbinom(10, 1, 0.5), b = rbinom(10, 2, 0.3), c = rbinom(10, 4, 0.8))

df
   a b c
1  1 0 1
2  0 1 1
3  0 0 1
4  1 0 0
5  1 1 1
6  0 1 1
7  0 1 0
8  0 0 1
9  1 0 1
10 0 0 1

然后，我想创建一个索引，考虑所有观察（行）的每个变量的相对“存在”，如：

df2 <- 1/(colSums(df))

df2

  a     b     c 
0.250 0.250 0.125

现在，我想回到df。对于每列和每个观察，如果变量的值为1，则将值替换为df2中的值。否则，如果原始值为0，那么我想保留它。我试图执行一个循环，但它不能很好地工作。

for(i in 1:ncol(df)){

  df[,i][df==1] <- df2[i]

}

[<-.data.frame中的错误（*tmp* ,, i，value = c（0.25,0,0,0.25,0.25，：替换有30行，数据有10行

有没有其他方法可以做到这一点？

Answer 1

您可以使用mapply来做到这一点，即

mapply(function(x, y) replace(x, x==1, y), df, i1)
#where i1 <- 1/colSums(df)

给出，

             a    b c
 [1,] 0.0000000 0.00 4
 [2,] 0.3333333 0.25 4
 [3,] 0.0000000 0.00 4
 [4,] 0.3333333 0.00 3
 [5,] 0.0000000 0.00 3
 [6,] 0.0000000 0.00 3
 [7,] 0.0000000 0.25 4
 [8,] 0.3333333 0.25 3
 [9,] 0.0000000 0.25 4
[10,] 0.0000000 0.00 2

注意您的df2（我的i1）值与我的不同，因为您没有使用set.seed来使rbinom重现

Answer 2

另一种选择：

df2 <- data.frame(matrix(rep(1/(colSums(df)), nrow(df)),
                         byrow = TRUE, nrow = nrow(df)))

df2[df == 0] <- 0

给出：

> df2
      a    b     c
1  0.25 0.00 0.125
2  0.00 0.25 0.125
3  0.00 0.00 0.125
4  0.25 0.00 0.000
5  0.25 0.25 0.125
6  0.00 0.25 0.125
7  0.00 0.25 0.000
8  0.00 0.00 0.125
9  0.25 0.00 0.125
10 0.00 0.00 0.125

使用过的数据：

df <- structure(list(a = c(1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L), 
                     b = c(0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
                     c = c(1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)),
                .Names = c("a", "b", "c"), class = "data.frame", row.names = c(NA, -10L))

Answer 3

你可以先找到那些，然后通过乘法覆盖它们。然而，只有当你想要替换它时才有效，而@Sotos方法适用于所有人。

df_is_1 <- df==1
df[df_is_1] <- (df_is_1*df2)[df_is_1]

通过外部数字向量

3 个答案: