我有一个数据框,每个ID列有多行,但我只想保留一行。不幸的是,我无法使用真实的数据框,因此我将在下面创建一个类似的假设数据框。
+----+--------+--------------------+--+
| ID | name | org | |
+----+--------+--------------------+--+
| 1 | Apple | Apple | |
+----+--------+--------------------+--+
| 1 | Apple | Sour | |
+----+--------+--------------------+--+
| 1 | Apple | Goldstar | |
+----+--------+--------------------+--+
| 2 | Banana | Banana | |
+----+--------+--------------------+--+
| 2 | Banana | banana | |
+----+--------+--------------------+--+
| 3 | Yogi | yogi | |
+----+--------+--------------------+--+
| 3 | yogi | strawberry yoghurt | |
+----+--------+--------------------+--+
我正在寻找一种删除所有行的方法,除了在可能值列表中找到的第一行之外,如果没有匹配项,则保留所有行。
在这种假设的情况下,我想给函数一个值列表,例如:
appleNamesTokeep <- c("Goldstar", "Apple", "Sour")
bananaNamesTokeep <- c("Banana", "banana") #Capital sensitive
yoghurtNamesTokeep <- c("strawberry yoghurt", "yogi")
结果将是
+----+--------+--------------------+--+
| ID | name | org | |
+----+--------+--------------------+--+
| 1 | Apple | Goldstar | |
+----+--------+--------------------+--+
| 2 | Banana | Banana | |
+----+--------+--------------------+--+
| 3 | yogi | strawberry yoghurt | |
+----+--------+--------------------+--+
如果找到名称列值为“ Goldstar”的行,则应删除所有其他行,如果找不到金星但具有“ Apple”,则应保留该行并删除其他所有内容,因此上。它应该按ID和每个列表进行查找,因为每一行都可能涉及完全不同的主题(在这种情况下,食物的类型不同)。
答案 0 :(得分:0)
以R为底的可能解决方案:
i1 <- match(tolower(substr(mydf$name,1,3)), substr(NamesTokeep$ind,1,3))
i2 <- match(mydf$org, NamesTokeep$values)
现在使用:
mydf[which(i1 & i2),]
给您
ID name org 3 1 Apple Goldstar 4 2 Banana Banana 7 3 yogi strawberry yoghurt
使用的数据:
mydf <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
name = c("Apple", "Apple", "Apple", "Banana", "Banana", "Yogi", "yogi"),
org = c("Apple", "Sour", "Goldstar", "Banana", "banana", "yogi", "strawberry yoghurt")),
.Names = c("ID", "name", "org"), class = "data.frame", row.names = c(NA, -7L))
NamesTokeep <- stack(list(apple = c("Goldstar", "Apple", "Sour"),
banana = c("Banana", "banana"),
yoghurt = c("strawberry yoghurt", "yogi")))[2:1]
NamesTokeep <- aggregate(values ~ ind, data = NamesTokeep, '[', 1)