我有一个包含以下信息的数据集:
我想通过扫描每组中的每个健康状况以及健康状况是否健康来以某种方式对数据进行子集化。在每个组的最后一行中,对该组的信息进行子集化。所需的输出是:
答案 0 :(得分:2)
使用软件包,您可以在此处使用dplyr或data.table:
library(dplyr)
DF %>% group_by(group) %>% filter(health[n()] == "N")
group health
(fctr) (fctr)
1 a H
2 a H
3 a N
4 c H
5 c H
6 c N
library(data.table)
setDT(DF)
DF[, if (health[.N] == "N") .SD, by=group]
group health
1: a H
2: a H
3: a N
4: c H
5: c H
6: c N
正如@docendodiscimus指出的那样,您可以使用last(health)
代替health[n()]
或health[.N]
。这两个软件包都有一个last
函数来执行此操作。
在基地,提供了@docendo:
subset(DF, ave(health == "N", group, FUN = function(x) tail(x, 1)))
来自@akrun:
subset(DF, group %in% group[health == "N" & !duplicated(group, fromLast=TRUE)])
数据即可。我没有准确地使用OP的数据,因为复制很痛苦。相反,它是:
group health
1 a H
2 a H
3 a N
4 b H
5 b H
6 b H
7 c H
8 c H
9 c N
DF = structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("a", "b", "c"), class = "factor"), health = structure(c(1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("H", "N"), class = "factor")), .Names = c("group",
"health"), row.names = c(NA, -9L), class = "data.frame")