R每个ID列按值列表删除行

时间:2018-10-16 09:26:47

标签: r dataframe filtering

我有一个数据框,每个ID列有多行,但我只想保留一行。不幸的是,我无法使用真实的数据框,因此我将在下面创建一个类似的假设数据框。

+----+--------+--------------------+--+
| ID | name   | org                |  |
+----+--------+--------------------+--+
| 1  | Apple  | Apple              |  |
+----+--------+--------------------+--+
| 1  | Apple  | Sour               |  |
+----+--------+--------------------+--+
| 1  | Apple  | Goldstar           |  |
+----+--------+--------------------+--+
| 2  | Banana | Banana             |  |
+----+--------+--------------------+--+
| 2  | Banana | banana             |  |
+----+--------+--------------------+--+
| 3  | Yogi   | yogi               |  |
+----+--------+--------------------+--+
| 3  | yogi   | strawberry yoghurt |  |
+----+--------+--------------------+--+

我正在寻找一种删除所有行的方法,除了在可能值列表中找到的第一行之外,如果没有匹配项,则保留所有行。

在这种假设的情况下,我想给函数一个值列表,例如:

appleNamesTokeep <- c("Goldstar", "Apple", "Sour")
bananaNamesTokeep <- c("Banana", "banana") #Capital sensitive
yoghurtNamesTokeep <- c("strawberry yoghurt", "yogi")

结果将是

+----+--------+--------------------+--+
| ID | name   | org                |  |
+----+--------+--------------------+--+
| 1  | Apple  | Goldstar           |  |
+----+--------+--------------------+--+
| 2  | Banana | Banana             |  |
+----+--------+--------------------+--+
| 3  | yogi   | strawberry yoghurt |  |
+----+--------+--------------------+--+

如果找到名称列值为“ Goldstar”的行,则应删除所有其他行,如果找不到金星但具有“ Apple”,则应保留该行并删除其他所有内容,因此上。它应该按ID和每个列表进行查找,因为每一行都可能涉及完全不同的主题(在这种情况下,食物的类型不同)。

1 个答案:

答案 0 :(得分:0)

以R为底的可能解决方案:

i1 <- match(tolower(substr(mydf$name,1,3)), substr(NamesTokeep$ind,1,3))
i2 <- match(mydf$org, NamesTokeep$values)

现在使用:

mydf[which(i1 & i2),]

给您

  ID   name                org
3  1  Apple           Goldstar
4  2 Banana             Banana
7  3   yogi strawberry yoghurt

使用的数据:

mydf <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
                       name = c("Apple", "Apple", "Apple", "Banana", "Banana", "Yogi", "yogi"),
                       org = c("Apple", "Sour", "Goldstar", "Banana", "banana", "yogi", "strawberry yoghurt")),
                  .Names = c("ID", "name", "org"), class = "data.frame", row.names = c(NA, -7L))

NamesTokeep <- stack(list(apple = c("Goldstar", "Apple", "Sour"),
                          banana = c("Banana", "banana"),
                          yoghurt = c("strawberry yoghurt", "yogi")))[2:1]
NamesTokeep <- aggregate(values ~ ind, data = NamesTokeep, '[', 1)