根据组计算是否存在

时间:2018-10-31 04:39:37

标签: r

我有一个数据框,其中包含两个位置的许多物种的丰度数据:

GET _search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "date": {
              "gte": "1438367180542",
              "lte": "1738367180542"
            }
          }
        },
        {
          "term": {
            "eventName.keyword": "XXXXXXX"
          }
        }
      ]
    }
  }
}

我想计算两件事:

  1. 在每个地点发现了多少种。在此虚拟示例中,SiteA有两种,SiteB有四种。

  2. 每个站点每一行中的分类单元平均数量。在这种情况下,SiteA 1,SiteB 2。

1 个答案:

答案 0 :(得分:2)

我喜欢将dplyrtidyverse软件包用于这类汇总问题。更多内容: https://dplyr.tidyverse.org/

library(tidyverse)
# First I'd like to reshape into long (aka "tidy") format
df_tidy <- df %>%
  mutate(obs_num = row_number()) %>%  # To keep track of orig row
  gather(sp, count, sp1:sp4)

# First question
df_tidy %>%
  # This gives total counts for all recorded combos of site and species
  count(site, sp, wt = count) %>%
  filter(n > 0) %>%
  count(site)        # Count how many rows (ie species) for each site
## A tibble: 2 x 2
#  site     nn
#  <chr> <int>
#1 SiteA     2
#2 SiteB     4


# Second question
df_tidy %>%
  # Count how many observations had counts > 0 for each site
  count(site, obs_num, wt = count > 0) %>%
  group_by(site) %>%
  summarize(avg_taxa = mean(n))

## A tibble: 2 x 2
#  site  avg_taxa
#  <chr>    <dbl>
#1 SiteA        1
#2 SiteB        2