保持一个因子的级别包含另一个因子的最小级别数

时间:2019-01-27 05:42:43

标签: r filter

我有一个这样的数据框:

 df<-data.frame(year= as.numeric(c(rep(1997, 5), rep(1998, 5), rep(1999, 5))), 
       sp= c("A", "B", "C", "D", "E", "A", "B", "C", "F", "G", "H", "I", "J","A", "B"))

我要保留sp中唯一数量最少的year级别。对于此示例,我要保留sp的数据,该数据至少有2年。

我已经尝试过:

df<-
 df %>% 
 group_by(sp) %>% 
 filter(length(year) >= 2)

正确的输出是:

 output<- data.frame( year= c("1997", "1998", "1999","1997", "1998", "1999", "1997", "1998"), 
                 sp= c("A", "A", "A", "B", "B", "B", "C", "C"))

2 个答案:

答案 0 :(得分:0)

您可以使用aggregate()

df1 <- merge(df1, aggregate(list(count=df1$year), by=list(sp=df1$sp), length))
df1 <- df1[df1$count >= 2, c(2, 1)]

结果

> df1
  year sp
1 1997  A
2 1998  A
3 1999  A
4 1998  B
5 1999  B
6 1997  B
7 1998  C
8 1997  C

数据

df1 <- structure(list(year = c(1997, 1998, 1999, 1998, 1999, 1997, 1998, 
1997), sp = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", 
"B", "C", "D", "E", "F", "G", "H", "I", "J"), class = "factor")), row.names = c(NA, 
8L), class = "data.frame")

答案 1 :(得分:0)

一种dplyr方法:

df %>% group_by(sp) %>% filter(n() >= 2) %>% arrange(sp)

#    year sp   
#   <dbl> <fct>
# 1  1997 A    
# 2  1998 A    
# 3  1999 A    
# 4  1997 B    
# 5  1998 B    
# 6  1999 B    
# 7  1997 C    
# 8  1998 C