R基于应用于许多列的条件语句生成新变量

时间:2014-09-29 15:38:37

标签: r conditional-statements lapply

可能有一种明显而优雅的方式来做这件事,可能是使用lapply,但我仍在掌握应用命令并且正在努力找到它。

我有一个类似于以下内容的数据框,除了代替5个因子变量,有数十个,而不是10行有数百个。

    a<- data.frame("id" = c(1:10),
                   "a1" = factor(c(0,0,1,1,0,1,0,1,0,1)),
                   "a2" = factor(c(0,0,0,0,0,0,0,0,1,0)), 
                   "a3" = factor(c(0,0,0,0,0,1,0,0,0,0)),
                   "a4" = factor(c(0,0,0,0,0,0,0,0,1,1)), 
                   "a5" = factor(c(0,0,0,1,0,0,0,0,0,0)))

我想创建一个新变量,如果13列中的任何一列包含特定级别的因子,则该变量为1。示例数据框中的等价物将创建一个名为“b”的新变量,即1,在任何一列a1:a4中都有一个“1”,如下所示。

    a<- data.frame("id" = c(1:10),
                   "a1" = factor(c(0,0,1,1,0,1,0,1,0,1)),
                   "a2" = factor(c(0,0,0,0,0,0,0,0,1,0)), 
                   "a3" = factor(c(0,0,0,0,0,1,0,0,0,0)),
                   "a4" = factor(c(0,0,0,0,0,0,0,0,1,1)), 
                   "a5" = factor(c(0,0,0,1,0,0,0,0,0,0)), 
                   "b"  = c(0,0,1,1,0,1,0,1,1,1))

使用GOT作为一种方法,使用13列位置而不是为13个变量中的每一个写一个条件ifthen语句。

3 个答案:

答案 0 :(得分:4)

只需使用rowSums,就像这样:

> as.numeric(rowSums(a[paste0("a", 1:5)] == 1) >= 1)
 [1] 0 0 1 1 0 1 0 1 1 1

答案 1 :(得分:0)

如果您想尝试lapply

  Reduce(`|`,lapply(a[,-1], function(x) as.numeric(as.character(x))))+0
  #[1] 0 0 1 1 0 1 0 1 1 1

或者只是

  Reduce(`|`, lapply(a[,-1], `==`, 1)) +0
  #[1] 0 0 1 1 0 1 0 1 1 1

基准

set.seed(155)
df <- as.data.frame(matrix(sample(0:1, 5000*1e4, replace=TRUE), ncol=5000))

library(microbenchmark)
f1 <- function() {as.numeric(rowSums(df == 1) >= 1) }
f2 <- function() {Reduce(`|`, lapply(df, `==`, 1)) +0}
f3 <- function() {apply(df == 1, 1, function(x) any(x %in% TRUE))+0}

microbenchmark(f1(), f2(), f3(), unit="relative")
# Unit: relative
# expr       min       lq   median       uq      max neval
# f1() 1.000000 1.000000 1.000000 1.000000 1.000000   100
# f2() 1.040561 1.043713 1.053773 1.032932 1.045067   100
# f3() 2.538287 2.517184 2.825253 2.477225 2.454511   100

答案 2 :(得分:0)

将矩阵转换为逻辑后,您也可以使用any

> apply(a[grep("a[1-4]", names(a))] == 1, 1, any)+0
# [1] 0 0 1 1 0 1 0 1 1 1

> apply(a[grepl("a[1-4]", names(a))] == 1, 1, any)+0
# [1] 0 0 1 1 0 1 0 1 1 1