对于示例数据框:
df <- structure(list(id = 1:19, region.1 = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 5L
), .Label = c("AT1", "AT2", "AT3", "AT4", "AT5"), class = "factor"),
PoorHealth = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L)), .Names = c("id", "region.1",
"PoorHealth"), class = "data.frame", row.names = c(NA, -19L))
我想使用BY命令进行子集化,并希望有人可以帮助我。
我想在df中包含满足这个条件的区域(regions.1):
或者这个条件:
如果有人有任何想法可以帮助我,我应该非常感激。
答案 0 :(得分:1)
这应该有效。 Dno,如果有更清洁的方式:
library(data.table)
setDT(df)
qualified_regions = df[,which((sum(PoorHealth==1) <=3 | .N <= 6)),region.1][,region.1]
df[region.1 %in% qualified_regions,]
E:我删除了!
- 标记,因为OP已更改&#34; EXCLUDE&#34;到&#34; INCLUDE&#34;在原来的问题。