Question

我有以下数据框：

a    b     c     d     e
TRUE TRUE FALSE  TRUE  TRUE
FALSE TRUE  TRUE  TRUE FALSE
TRUE TRUE FALSE  TRUE  TRUE
TRUE TRUE  TRUE FALSE  TRUE
TRUE TRUE  TRUE  TRUE  TRUE
TRUE TRUE  TRUE  TRUE  TRUE

我想创建一个额外的列，比如f，使用以下逻辑：

TRUE = If all the columns in the corresponding row are all TRUE or all FALSE.
FALSE = if one or more colums differ from the other columns in the corresponding row.

在此示例中，输出为

a    b     c     d     e     f
TRUE TRUE FALSE  TRUE  TRUE  FALSE
FALSE TRUE  TRUE  TRUE FALSE  FALSE
TRUE TRUE FALSE  TRUE  TRUE  FALSE
TRUE TRUE  TRUE FALSE  TRUE  FALSE
TRUE TRUE  TRUE  TRUE  TRUE  TRUE
TRUE TRUE  TRUE  TRUE  TRUE  TRUE

Answer 1

使用此：

DF$f <- apply(DF, 1, function(x)(all(x) || all(!x)))

其中“DF”是您的数据框。

Answer 2

或者，利用logical值仅为0 s和1 s进行算术的事实：

rowMeans(dat) %in% 0:1
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE

Answer 3

@ Ferdinand.kraft提供的答案是正确的，并且是最易读的答案，但是黑客就是使用rowSums：

DF$f <- rowSums(DF) %in% c(0, 5)

我的系统速度提高了约33％：

> system.time(replicate(10000, apply(DF, 1, function(x) {all(x) || all(!x)})))
   user  system elapsed 
   3.11    0.00    3.12 

> system.time(replicate(10000, rowSums(DF) %in% c(0, 5)))
   user  system elapsed 
   1.95    0.00    1.95

但是，正如我所说，这是一种黑客行为，应该只在速度很重要时使用。

Answer 4

就速度而言，这是非常快的：

do.call(pmin.int,dat)==do.call(pmax.int,dat)

速度测试：

microbenchmark(
    allorall=apply(dat, 1, function(x) {all(x) || all(!x)}),
    rmeans=rowMeans(dat) %in% 0:1,
    rsum=rowSums(dat) %in% c(0, 5),
    minmax=do.call(pmin.int,dat)==do.call(pmax.int,dat)
)

Unit: microseconds
     expr     min       lq   median       uq      max neval
 allorall 278.598 287.7760 301.8145 340.9585  722.410   100
   rmeans 178.174 182.4925 191.6715 205.9790  471.888   100
     rsum 177.093 182.7625 188.4315 202.4695 1796.304   100
   minmax  17.278  19.9775  22.1375  26.1870   42.115   100

为什么没有prange？

在R中的多个列的逻辑矢量

4 个答案: