使用dplyr基于重复值汇总因子级别的分割数据框

时间:2017-06-28 22:24:32

标签: r

我有一个像这样的数据框:

df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), 
  loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= 
c("a","a","b","a","e","e","e","e","a","a"), sp2= 
c("b","b","c","b","f","f","f","f","b","b"), inter= 
c("a_b","a_b","b_c","a_b","e_f","e_f","e_f","e_f","a_b","a_b"))

我希望按region进行分组,找出区域内inter中的每个重复级别loc,然后计算它出现的地块数量。输出数据框应显示为如下:

df<- data.frame(region= c("1","1","2"), sp1= 
 c("a","e","a"), sp2= 
 c("b","f","b"), inter= 
 c("a_b","e_f","a_b"), freq=c("2","3","2"))

我尝试了以下内容:

df %>%
group_by(region,inter) %>%
filter(duplicated(inter))

1 个答案:

答案 0 :(得分:1)

您可以过滤到每个regioninter组合中包含多行的论坛,然后使用n_distinct来计算唯一身份的数量。我将物种变量作为组包含在数据集中。

df %>%
     group_by(region, sp1, sp2, inter) %>%
     filter(n() > 1) %>%
     summarise( n = n_distinct(loc) )

# A tibble: 3 x 5
# Groups:   region, sp1, sp2 [?]
  region    sp1    sp2  inter     n
  <fctr> <fctr> <fctr> <fctr> <int>
1      1      a      b    a_b     2
2      1      e      f    e_f     3
3      2      a      b    a_b     2