更新

Question

我有一个长格式的数据框，我想根据值的唯一组合过滤对。我有一个如下所示的数据集：

id <- rep(1:4, each=2)
type <- c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
df <- data.frame(id,type)
df
  id   type
1  1   blue
2  1   blue
3  2    red
4  2 yellow
5  3   blue
6  3    red
7  4    red
8  4 yellow

让我们说每个id都是一个响应者，type是治疗的组合。个人1看到两个物体，都是蓝色的;个人2看到一个红色物体和一个黄色物体;等等。

例如，我如何保持那些看到组合＆＃34;红色＆＃34;和＆＃34;黄色＆＃34;？如果我按照组合过滤＆＃34;红色＆＃34;和＆＃34;黄色＆＃34;结果数据集应如下所示：

  id   type
3  2    red
4  2 yellow
7  4    red
8  4 yellow

它应该保留2号和4号受访者（只有那些看到组合＆＃34;红色＆＃34;＆＃34;黄色＆＃34;）。请注意，它没有保留第3号受访者，因为她看到＆＃34; blue＆＃34;和＆＃34;红色＆＃34; （而不是＆＃34;红色＆＃34;和＆＃34;黄色＆＃34;）。我该怎么做？

一种解决方案是将数据集重新整形为宽格式，按列过滤，然后重新进行重新打包。但我相信还有另一种方法可以在不重塑数据集的情况下完成。有什么想法吗？

Answer 1

dplyr解决方案是：

library(dplyr)
df <- data_frame(
  id = rep(1:4, each = 2),
  type = c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
)

types <- c("red", "yellow")

df %>% 
  group_by(id) %>% 
  filter(all(types %in% type))
#> # A tibble: 4 x 2
#> # Groups:   id [2]
#>      id   type
#>   <int>  <chr>
#> 1     2    red
#> 2     2 yellow
#> 3     4    red
#> 4     4 yellow

更新

允许相等的组合，例如blue，blue，我们必须将过滤器调用更改为以下内容：

types2 <- c("blue", "blue")

df %>% 
  group_by(id) %>% 
  filter(sum(types2 == type) == length(types2))
#> # A tibble: 2 x 2
#> # Groups:   id [1]
#>      id  type
#>   <int> <chr>
#> 1     1  blue
#> 2     1  blue

此解决方案还允许使用不同类型

df %>% 
  group_by(id) %>% 
  filter(sum(types == type) == length(types))
#> # A tibble: 4 x 2
#> # Groups:   id [2]
#>      id   type
#>   <int>  <chr>
#> 1     2    red
#> 2     2 yellow
#> 3     4    red
#> 4     4 yellow

Answer 2

让我们使用all()查看组内的所有行是否与一组值匹配。

library(tidyverse)

test_filter <- c("red", "yellow")

df %>%
  group_by(id) %>% 
  filter(all(test_filter %in% type))

# A tibble: 4 x 2
# Groups: id [2]
id type  
<int> <fctr>
1     2 red   
2     2 yellow
3     4 red   
4     4 yellow

Answer 3

我修改了您的数据并执行了以下操作。

df <- data.frame(id = rep(1:4, each=3),
                 type <- c("blue", "blue", "green", "red", "yellow", "purple",
                           "blue", "orange", "yellow", "yellow", "pink", "red"),
                 stringsAsFactors = FALSE)

   id   type
1   1   blue
2   1   blue
3   1  green
4   2    red
5   2 yellow
6   2 purple
7   3   blue
8   3 orange
9   3 yellow
10  4 yellow
11  4   pink
12  4    red

如您所见，每个id有三个观察结果。 ID 2和4同时包含red和yellow。它们还具有非目标颜色（即紫色和粉红色）。我想保留这些观察结果。为了完成这个任务，我编写了以下代码。代码可以像这样读取。 “对于每个ID，请使用red检查是否有yellow和any()。当两个条件均为TRUE时，请保留所有行的ID。”

group_by(df, id) %>%
filter(any(type == "yellow") & any(type == "red"))

   id   type
4   2    red
5   2 yellow
6   2 purple
10  4 yellow
11  4   pink
12  4    red

Answer 4

使用data.table：

library(data.table)
setDT(df)
df[, type1 := shift(type, type = "lag"), by = id]
df1 <- df[type == "yellow" & type1 == "red", id]
df <- df[id %in% df1, ]
df[, type1 := NULL]

它给出了：

   id   type
1:  2    red
2:  2 yellow
3:  4    red
4:  4 yellow

按（行）

4 个答案:

更新