我想总结并计算一个组内的案例数,并在没有案例的组中将输出置零。例如:
library(dplyr)
df <- structure(list(Station = c("TR1", "TR1", "TR1", "TR1", "TR1",
"TR1", "TR1", "TR1", "TR2", "TR2", "TR2", "TR2", "TR2", "TR2",
"TR2"), Age = c(1, 1, 1, 2, 2, 3, 4, 4, 1, 1, 1, 1, 3, 4, 4),
WeightTurtles = c(21, 22, 20, 43, 32, 32, 27, 32, 21, 22,
20, 15, 32, 37, 34)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -15L), .Names = c("Station", "Age", "WeightTurtles"
))
df %>%
group_by(Station, Age) %>%
summarise(NumTurtles=length(WeightTurtles))
结果如下:
Station Age NumTurtles (chr) (dbl) (int) 1 TR1 1 3 2 TR1 2 2 3 TR1 3 1 4 TR1 4 2 5 TR2 1 4 6 TR2 3 1 7 TR2 4 2
我想要的是上面输出中包含的一行,如下所示:
5 TR2 2 0
那就是,如何在长度为零的因子上包含级别的出现次数(或缺少出现次数)。更一般地说,如何告诉R使用所有可能的因子水平来计算长度?
答案 0 :(得分:2)
您可以使用complete
中的tidyr
功能执行此操作。 complete
为缺失的组添加一行,并为该行的NA
值填充WeightTurtles
(除非您选择不同的填充值):
library(dplyr)
library(tidyr)
df %>%
complete(Age, nesting(Station)) %>%
group_by(Station, Age) %>%
summarise(NumTurtles=sum(!is.na(WeightTurtles)))
Station Age NumTurtles 1 TR1 1 3 2 TR1 2 2 3 TR1 3 1 4 TR1 4 2 5 TR2 1 4 6 TR2 2 0 7 TR2 3 1 8 TR2 4 2
答案 1 :(得分:0)
以下是dplyr
我能想到的一个解决方案:
library(dplyr)
df <- left_join(expand.grid(Station = unique(df$Station),
Age = unique(df$Age), stringsAsFactors = FALSE),
df)
df %>%
group_by(Station, Age) %>%
summarise(NumTurtles = sum(!is.na(WeightTurtles)))
Source: local data frame [8 x 3]
Groups: Station [?]
Station Age NumTurtles
<chr> <dbl> <int>
1 TR1 1 3
2 TR1 2 2
3 TR1 3 1
4 TR1 4 2
5 TR2 1 4
6 TR2 2 0
7 TR2 3 1
8 TR2 4 2