从R的另一列中仅选择具有重复ID和特定值的行

时间:2018-07-02 00:05:57

标签: r filter dplyr

我有以下具有ID和值的数据:

id <- c("1103-5","1103-5","1104-2","1104-2","1104-4","1104-4","1106-2","1106-2","1106-3","1106-3","2294-1","2294-1","2294-2","2294-2","2294-2","2294-3","2294-3","2294-3","2294-4","2294-4","2294-5","2294-5","2294-5","2300-1","2300-1","2300-2","2300-2","2300-4","2300-4","2321-1","2321-1","2321-2","2321-2","2321-3","2321-3","2321-4","2321-4","2347-1","2347-1","2347-2","2347-2")

value <- c(6,3,6,3,6,3,6,3,6,3,3,6,9,3,6,9,3,6,3,6,9,3,6,9,6,9,6,9,6,9,3,9,3,9,3,9,3,9,6,9,6)

如果您注意到,则同一id有多个值。我想做的是仅在ID相同的情况下获得只有3和6的值。例如ID“ 1103-5”具有3和6,因此应该在列表中,但不能在“ 2347-2”中

我正在使用R

以下是我尝试的一种方法,但是它为我提供了值为3和6的所有内容。

d <- data.frame(id, value)
group36 <- d[d$value == 3 | d$value == 6,]

d %>% group_by(id) %>% filter(3 == value | 6 == value)

输出应如下所示:

id  value
1103-5  6
1103-5  3
1104-2  6
1104-2  3
1104-4  6
1104-4  3
1106-2  6
1106-2  3
1106-3  6
1106-3  3
2294-1  3
2294-1  6
2294-2  3
2294-2  6
2294-3  3
2294-3  6
2294-4  3
2294-4  6
2294-5  3
2294-5  6

2 个答案:

答案 0 :(得分:1)

d<-group_by(d,id)
filter(d,any(value==3),any(value==6))

这将为您提供所有ID都为3(某处)和6(某处)的ID。提醒您,您的数据包含一些具有三个值的ID。在这种情况下,如果同时存在3和6,则会将其包括在结果中。

如果要排除剩余的等于3或6的行,请添加以下内容:

filter(d,value==3 | value==6)

如果要排除同时具有3和6作为值但也具有OTHER值的ID,请使用以下方法:

filter(d,any(value==3),any(value==6),value==3 | value==6)

答案 1 :(得分:1)

不确定这是否是您想要的。我们可以过滤等于36的行,然后从长格式转换为宽格式,并仅保留同时具有36值的列。之后,转换回长格式。

library(dplyr)
library(tidyr)

id <- c("1103-5","1103-5","1104-2","1104-2","1104-4","1104-4","1106-2","1106-2",
        "1106-3","1106-3","2294-1","2294-1","2294-2","2294-2","2294-2",
        "2294-3","2294-3","2294-3","2294-4","2294-4","2294-5","2294-5","2294-5",
        "2300-1","2300-1","2300-2","2300-2","2300-4","2300-4","2321-1","2321-1",
        "2321-2","2321-2","2321-3","2321-3","2321-4","2321-4","2347-1","2347-1","2347-2","2347-2")

value <- c(6,3,6,3,6,3,6,3,6,3,3,6,9,3,6,9,3,6,3,6,9,3,6,9,6,9,6,9,6,9,3,9,3,9,3,9,3,9,6,9,6)

d <- data.frame(id, value)

d %>% 
  group_by(id) %>% 
  filter(value %in% c(3, 6)) %>% 
  mutate(rows = 1:n()) %>%
  spread(key = id, value) %>% 
  select_if(~ all(!is.na(.)))

#> # A tibble: 2 x 11
#>    rows `1103-5` `1104-2` `1104-4` `1106-2` `1106-3` `2294-1` `2294-2`
#>   <int>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1     1        6        6        6        6        6        3        3
#> 2     2        3        3        3        3        3        6        6
#> # ... with 3 more variables: `2294-3` <dbl>, `2294-4` <dbl>,
#> #   `2294-5` <dbl>

d %>% 
  group_by(id) %>% 
  filter(value %in% c(3, 6)) %>% 
  mutate(rows = 1:n()) %>%
  spread(key = id, value) %>% 
  select_if(~ all(!is.na(.))) %>% 
  select(-rows) %>% 
  gather(id, value)

#> # A tibble: 20 x 2
#>    id     value
#>    <chr>  <dbl>
#>  1 1103-5     6
#>  2 1103-5     3
#>  3 1104-2     6
#>  4 1104-2     3
#>  5 1104-4     6
#>  6 1104-4     3
#>  7 1106-2     6
#>  8 1106-2     3
#>  9 1106-3     6
#> 10 1106-3     3
#> 11 2294-1     3
#> 12 2294-1     6
#> 13 2294-2     3
#> 14 2294-2     6
#> 15 2294-3     3
#> 16 2294-3     6
#> 17 2294-4     3
#> 18 2294-4     6
#> 19 2294-5     3
#> 20 2294-5     6

reprex package(v0.2.0.9000)于2018-07-01创建。