我有一个数据框,其中包含两个位置的许多物种的丰度数据:
GET _search
{
"query": {
"bool": {
"filter": [
{
"range": {
"date": {
"gte": "1438367180542",
"lte": "1738367180542"
}
}
},
{
"term": {
"eventName.keyword": "XXXXXXX"
}
}
]
}
}
}
我想计算两件事:
在每个地点发现了多少种。在此虚拟示例中,SiteA有两种,SiteB有四种。
每个站点每一行中的分类单元平均数量。在这种情况下,SiteA 1,SiteB 2。
答案 0 :(得分:2)
我喜欢将dplyr
和tidyverse
软件包用于这类汇总问题。更多内容:
https://dplyr.tidyverse.org/
library(tidyverse)
# First I'd like to reshape into long (aka "tidy") format
df_tidy <- df %>%
mutate(obs_num = row_number()) %>% # To keep track of orig row
gather(sp, count, sp1:sp4)
# First question
df_tidy %>%
# This gives total counts for all recorded combos of site and species
count(site, sp, wt = count) %>%
filter(n > 0) %>%
count(site) # Count how many rows (ie species) for each site
## A tibble: 2 x 2
# site nn
# <chr> <int>
#1 SiteA 2
#2 SiteB 4
# Second question
df_tidy %>%
# Count how many observations had counts > 0 for each site
count(site, obs_num, wt = count > 0) %>%
group_by(site) %>%
summarize(avg_taxa = mean(n))
## A tibble: 2 x 2
# site avg_taxa
# <chr> <dbl>
#1 SiteA 1
#2 SiteB 2