过滤和汇总但保留“零”

时间:2018-11-23 05:01:01

标签: r

我在多个站点收集了数据。在每个地点,对物种进行识别(物种)并计数(数量)。我还记录了他们与我之间的距离(距离)。样本数据集是:

library(tidyverse)
library(dplyr)

Data <- data.frame(
  Site = c("1", "1", "1", "1", "2", "3", "3"),
  Species = c("abc", "bcd", "abc", "kjh", "jh", "abc", "gd"),
  Number = c(10,1,1,1,1,1,1),
  Distance = c("50m", "60m", "In", "In", "Out", "In", "In")
)

我想计算:(A)每个站点上唯一物种的数量,以及(B)每个物种的个体数量。但是,我希望过滤掉所有距离==“ Out”。我尝试了以下过滤器:

Filtered <- Data %>%
  filter(Distance %in% c(
    "50m", 
    "60m",
    "In"))

然后创建我的摘要:

summary <- Filtered %>%
  group_by(Site) %>% 
  summarize(richness = n_distinct(Species), count = sum(Number))
summary
# A tibble: 2 x 3
  Site  richness count
  <fct>    <int> <dbl>
1 1            3    13
2 3            2     2

但是我真正需要的是:

# A tibble: 3 x 3
  Site  richness count
  <fct>    <int> <dbl>
1 1            3    13
2 2            0     0
3 3            2     2

换句话说,我不希望将“ Out”站点包括在摘要计算中,但我想表明在“ non-Out”距离处有0种。

我想念一种更好的方法吗?

1 个答案:

答案 0 :(得分:3)

在进行group_by步骤分组之后,我们可以Site summarize并过滤“ Out”条目。

library(dplyr)
Data %>%
  group_by(Site) %>%
  summarize(richness = n_distinct(Species[Distance != "Out"]), 
            count = sum(Number[Distance != "Out"]))


#  Site  richness count
#  <fct>    <int> <dbl>
#1 1            3    13
#2 2            0     0
#3 3            2     2