Question

我有两个数据框：

subset <- data.frame(id=rep(1,7), country=c("CH", "CH", "CA", "DE", "FR", "AT", "DE"))

> subset
  id country
1  1      CH
2  1      CH
3  1      CA
4  1      DE
5  1      FR
6  1      AT
7  1      DE

whotoremove <- data.frame(id = c(1,1), country = c("DE", "FR"))

> whotoremove
  id country
1  1      DE
2  1      FR

我想从subset中删除whotoremove中的元素，不仅仅是匹配，还包括它们出现的次数。也就是说，我想得到类似的东西：

> subset
  id country
1  1      CH
2  1      CH
3  1      CA
6  1      AT
7  1      DE

请注意行名称：我想保留原始子集中的名称，因为我需要进一步使用它们。

非常感谢任何帮助。

Answer 1

希望这有帮助！

idx <- match(whotoremove$country, subset$country)
subset[-idx, ]

输出为：

  id country
1  1      CH
2  1      CH
3  1      CA
6  1      AT
7  1      DE

示例数据：

subset <- data.frame(id=rep(1,7), country=c("CH", "CH", "CA", "DE", "FR", "AT", "DE"))
whotoremove <- data.frame(id = c(1,1), country = c("DE", "FR"))

Answer 2

dplyr的一个解决方案是：

library(dplyr)

whotoremove <- whotoremove %>% 
  group_by(id, country) %>% 
  mutate(count = 1:n()

subset %>%
  rownames_to_column() %>% 
  group_by(id, country) %>% 
  mutate(count = 1:n()) %>% 
  anti_join(whotoremove, by = c("id", "country", "count")) 

# A tibble: 5 x 4
# Groups:   id, country [?]
#   rowname    id country count
#   <chr>   <dbl> <fct>   <int>
# 1 1          1. CH          1
# 2 2          1. CH          2
# 3 3          1. CA          1
# 4 6          1. AT          1
# 5 7          1. DE          2

为了保留rownames，我使用rownames_to_column - 函数并删除匹配的行，我使用anti_join。要仅删除组合发生的次数，我首先会引入一个count - 变量，并将其作为要合并到anti_join的列。

如何从数据框中删除特定数量的匹配条目？

2 个答案: