Question

在我的数据框架中，我有两个感兴趣的列：id和name - 我的目标是仅保留id的记录，其中id具有多个值在name 和中，name中的最终值为“B”。

示例数据如下所示：

，输出如下：

> output
   id name
1   1    A
9   1    B

如何过滤在R中获取这些行？我知道您可以使用%in%运算符来过滤那些具有多个变量的变量，但我不确定如何添加“B”必须是最后一条记录的条件。我不反对使用像dplyr这样的包，但基本R中的解决方案是理想的。有什么建议？

以下是示例数据：

> dput(test)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 2, 1, 2), name = c("A", 
"A", "A", "A", "A", "A", "A", "B", "B", "A")), .Names = c("id", 
"name"), row.names = c(NA, -10L), class = "data.frame")

Answer 1

使用dplyr，

test %>% 
 group_by(id) %>% 
 filter(n_distinct(name) > 1 & last(name) == 'B')

#Source: local data frame [2 x 2]
#Groups: id [1]

# A tibble: 2 x 2
#     id  name
#  <dbl> <chr>
#1     1     A
#2     1     B

Answer 2

在data.table：

library(data.table)
setDT(test)[, .SD[length(unique(name)) >= 2 & name[.N] == "B"],by = .(id)]
#   id name
#1:  1    A
#2:  1    B

子集如果包含特定顺序的多个变量

2 个答案: