我有一个这样的数据框:
df<-data.frame(year= as.numeric(c(rep(1997, 5), rep(1998, 5), rep(1999, 5))),
sp= c("A", "B", "C", "D", "E", "A", "B", "C", "F", "G", "H", "I", "J","A", "B"))
我要保留sp
中唯一数量最少的year
级别。对于此示例,我要保留sp
的数据,该数据至少有2年。
我已经尝试过:
df<-
df %>%
group_by(sp) %>%
filter(length(year) >= 2)
正确的输出是:
output<- data.frame( year= c("1997", "1998", "1999","1997", "1998", "1999", "1997", "1998"),
sp= c("A", "A", "A", "B", "B", "B", "C", "C"))
答案 0 :(得分:0)
您可以使用aggregate()
。
df1 <- merge(df1, aggregate(list(count=df1$year), by=list(sp=df1$sp), length))
df1 <- df1[df1$count >= 2, c(2, 1)]
结果
> df1
year sp
1 1997 A
2 1998 A
3 1999 A
4 1998 B
5 1999 B
6 1997 B
7 1998 C
8 1997 C
数据
df1 <- structure(list(year = c(1997, 1998, 1999, 1998, 1999, 1997, 1998,
1997), sp = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J"), class = "factor")), row.names = c(NA,
8L), class = "data.frame")
答案 1 :(得分:0)
一种dplyr
方法:
df %>% group_by(sp) %>% filter(n() >= 2) %>% arrange(sp)
# year sp
# <dbl> <fct>
# 1 1997 A
# 2 1998 A
# 3 1999 A
# 4 1997 B
# 5 1998 B
# 6 1999 B
# 7 1997 C
# 8 1998 C