我有一个数据框 df 看起来像
A1 A2 A3
1: 1 0 1
2: 1 1 0
3: 1 0 1
,并希望创建一个包含所有条件均值的矩阵,其中条件在另一列中为“ 1”。
例如:
我使用以下代码解决了问题:
col_names <- names(df)
mat = matrix(nrow=3, ncol=3)
for (i in 1:3){
for (j in 1:3){
mat[j,i]=mean(df[ get(col_names[j]) == 1, ][[col_names[i]]],na.rm = TRUE)
}
}
colnames(mat) <- col_names
rownames(mat) <- col_names
mat 看起来应该像这样(这里有分数):
A1 A2 A3
A1 3/3 1/1 2/2
A2 1/3 1/1 0/2
A3 2/3 0/1 2/2
不幸的是,这段代码非常慢。有没有一种方法(也许用lapply ...)来加快速度?我没有摆脱条件(== 1)...
答案 0 :(得分:1)
#DATA
df1 = structure(list(A1 = c(1L, 1L, 1L),
A2 = c(0L, 1L, 0L),
A3 = c(1L, 0L, 1L)),
class = "data.frame", row.names = c(NA, -3L))
df1
# A1 A2 A3
#1 1 0 1
#2 1 1 0
#3 1 0 1
sapply(1:NCOL(df1), function(i) sapply(1:NCOL(df1), function(j) {
sum((df1[,j])[df1[,i] == 1])/sum(df1[,i])
#paste0(sum((df1[,j])[df1[,i] == 1]),"/",sum(df1[,i]))
}))
# [,1] [,2] [,3]
#[1,] 1.0000000 1 1
#[2,] 0.3333333 1 0
#[3,] 0.6666667 0 1
答案 1 :(得分:0)
另一个选择:
# convert to matrix
mat <- as.matrix(df)
# calculate numerator
numer <- matrix(NA, 3, 3)
for(j in 1:ncol(df)) { numer[,j] <- colSums(mat*mat[,j]) }
# calculate denominator
denom <- matrix(apply(mat, 2, sum, na.rm=T), 3, 3, byrow=T)
# divide numer/denom and fix NAs and Infs
result <- numer / denom
result[is.na(result)|is.infinite(result)] <- 0
result