如何在dplyr中使用或/和子集data.frame

时间:2014-06-20 04:03:59

标签: r dplyr

我想用/或和的组合对data.frame进行子集化。这是我使用普通R函数的代码。

df <- expand.grid(list(A = seq(1, 5), B = seq(1, 5), C = seq(1, 5)))
df$value <- seq(1, nrow(df))

df[(df$A == 1 & df$B == 3) |
    (df$A == 3 & df$B == 2),]

如何在dplyr包中使用过滤功能转换它们?感谢您的任何建议。

3 个答案:

答案 0 :(得分:32)

dplyr解决方案:

加载库:

library(dplyr)

按上述条件过滤:

df %>% filter(A == 1 & B == 3 | A == 3 & B ==2)

答案 1 :(得分:7)

您也可以使用subset()[。以下是一些不同的方法及其在较大数据集上的各自基准。

df <- expand.grid(A = 1:100, B = 1:100, C = 1:100)
df$value <- 1:nrow(df)

library(dplyr); library(microbenchmark)
f1 <- function() subset(df, A == 1 & B == 3 | A == 3 & B == 2)
f2 <- function() filter(df, A == 1 & B == 3 | A == 3 & B == 2)
f3 <- function() df[with(df, A == 1 & B == 3 | A == 3 & B == 2), ]
f4 <- function() df[(df$A == 1 & df$B == 3) | (df$A == 3 & df$B == 2),]

microbenchmark(subset = f1(), filter = f2(), with = f3(), "$" = f4())
# Unit: milliseconds
#    expr      min       lq     mean   median       uq      max neval
#  subset 47.42671 49.99802 75.95385 92.24430 96.05960 141.2964   100
#  filter 36.94019 38.77325 60.22831 42.64112 84.35896 155.0145   100
#    with 38.90918 44.36299 71.29214 86.39629 88.89008 134.7670   100
#       $ 40.22723 44.08606 71.32186 86.71372 89.59275 133.1132   100

答案 2 :(得分:0)

有趣。我试图看到结果数据集的差异,我不能解释为什么好的旧“[”运算符表现不同:

use App\Events\Event;
use App\Events\NewMessageSent;
use App\Events\MessageReplied;
use App\Events\MessageForwarded;

public function handle(Event $event)
{
    if ($event instanceof NewMessageSent) {
        dd('message sent');
    } else if ($event instanceof MessageReplied) {
        dd('message replied');
    } else if ($event instanceof MessageForwarded) {
       dd('message forwarded');
    }
}