我有一个像这样的数据框:
df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"),
loc=c("A","A","A","B","B","B","C","D","E","F"), sp1=
c("a","a","b","a","e","e","e","e","a","a"), sp2=
c("b","b","c","b","f","f","f","f","b","b"), inter=
c("a_b","a_b","b_c","a_b","e_f","e_f","e_f","e_f","a_b","a_b"))
我希望按region
进行分组,找出区域内inter
中的每个重复级别loc
,然后计算它出现的地块数量。输出数据框应显示为如下:
df<- data.frame(region= c("1","1","2"), sp1=
c("a","e","a"), sp2=
c("b","f","b"), inter=
c("a_b","e_f","a_b"), freq=c("2","3","2"))
我尝试了以下内容:
df %>%
group_by(region,inter) %>%
filter(duplicated(inter))
答案 0 :(得分:1)
您可以过滤到每个region
和inter
组合中包含多行的论坛,然后使用n_distinct
来计算唯一身份的数量。我将物种变量作为组包含在数据集中。
df %>%
group_by(region, sp1, sp2, inter) %>%
filter(n() > 1) %>%
summarise( n = n_distinct(loc) )
# A tibble: 3 x 5
# Groups: region, sp1, sp2 [?]
region sp1 sp2 inter n
<fctr> <fctr> <fctr> <fctr> <int>
1 1 a b a_b 2
2 1 e f e_f 3
3 2 a b a_b 2