我有一个以下数据框,我想检查二进制列并将非空值更改为1.
a <- c("","a","a","","a")
b <- c("","b","b","b","b")
c <- c("c","","","","c")
d <- c("b","a","","c","d")
dt <- data.frame(a,b,c,d)
我能够通过循环遍历每一列来获得解决方案。但是,我想要一些有效的解决方案,因为我的数据框确实非常大,而下面的解决方案要慢得多。
我的解决方案 -
for(i in 1:length(colnames(dt)))
{
if(length(table(dt[,i]))==2){
dt[which(dt[,i]!=""),i] <- 1
}
}
Expected Output:
a b c d
1 b
1 1 a
1 1
1 c
1 1 1 d
有没有办法提高效率。
答案 0 :(得分:2)
inds = lengths(lapply(dt, unique)) == 2
dt[inds] = lapply(dt[inds], function(x) as.numeric(as.character(x) != ""))
dt
# a b c d
#1 0 0 1 b
#2 1 1 0 a
#3 1 1 0
#4 0 1 0 c
#5 1 1 1 d
如果您想要""
而不是0
dt[inds] = lapply(dt[inds], function(x) c("", 1)[(as.character(x) != "") + 1])
dt
# a b c d
#1 1 b
#2 1 1 a
#3 1 1
#4 1 c
#5 1 1 1 d
答案 1 :(得分:2)
由于您的问题似乎很有效,您可能需要查看dplyr
或data.table
library(dplyr)
mutate_all(dt, .funs = quo(if_else(n_distinct(.) <= 2L & . != "", "1", .)))
library(data.table)
setDT(dt)
dt[ , lapply(.SD, function(x) ifelse(uniqueN(x) <= 2L & x != "", 1, x))]